All Episodes

Ep. 12/19/202610 min

APR Ep01: Intro — An Approach to Technical AGI Safety and Security

Alex and Thuy kick off a deep dive into Google DeepMind's comprehensive AGI safety paper by Rohin Shah and 29 co-authors. In this episode: - Meet the hosts — Alex (fintech PM) and Thuy (AI researche

Ep. 22/19/202631 min

APR Ep02: The Four Risks of AGI

What could go wrong with AGI? Alex and Thuy walk through the paper's four risk categories — misuse, misalignment, mistakes, and structural risks — and why DeepMind chose to focus on just two. In this

Ep. 32/19/202632 min

APR Ep03: Fighting Misuse

How do you stop people from weaponizing AI? Alex and Thuy dig into the paper's layered approach to misuse prevention — from threat modeling to capability evaluations to red teaming. In this episode:

Ep. 42/19/202631 min

APR Ep04: The Alignment Problem

When AI itself goes rogue. Alex and Thuy tackle the hardest chapter — what happens when you can't tell if your AI's output is good or bad, and the techniques being developed to address it. In this ep

Ep. 52/19/202633 min

APR Ep05: Robust Training and System-Level Security

The second line of defense — what happens when alignment fails. Alex and Thuy explore how to treat AI as an untrusted insider and build containment that works even against adversarial models. In this

Ep. 62/19/202636 min

APR Ep06: The Safety Toolbox

Interpretability, uncertainty estimation, and safer design patterns — the cross-cutting tools that make every defense layer work better. In this episode: - Interpretability — Three transformative ap

Ep. 72/19/202639 min

APR Ep07: Making the Safety Case

How do you decide whether an AGI system is safe enough to deploy? Alex and Thuy walk through the paper's four types of safety cases and the role of red teaming. In this episode: - Inability cases —

Ep. 82/19/202616 min

APR Ep08: Conclusion — Key Takeaways and Open Questions

Alex and Thuy wrap up the series with key takeaways, surprises, practical advice, and the open questions that keep them up at night. In this episode: - Defense in depth — The philosophy that runs th

Ep. 92/20/202621 min

APR Ep09: Measurement and Evaluation

Alex and Thuy dive into Anthropic's recommendations for technical AI safety research, starting with the foundational challenges of measuring what AI systems can do (capabilities evaluation) and whethe

Ep. 102/20/202620 min

APR Ep10: Control and Oversight

Continuing Anthropic's AI safety research directions, Alex and Thuy explore how to deploy potentially misaligned systems safely and how to oversee systems that might be smarter than human evaluators.

Ep. 112/20/202627 min

APR Ep11: Robustness and Future Directions

The final episode on Anthropic's AI safety research agenda covers adversarial robustness, unlearning dangerous capabilities, multi-agent governance, and synthesizes key themes for building safe AI sys