All Episodes

APR Ep01: Intro — An Approach to Technical AGI Safety and Security
Alex and Thuy kick off a deep dive into Google DeepMind's comprehensive AGI safety paper by Rohin Shah and 29 co-authors. In this episode: - Meet the hosts — Alex (fintech PM) and Thuy (AI researche

APR Ep02: The Four Risks of AGI
What could go wrong with AGI? Alex and Thuy walk through the paper's four risk categories — misuse, misalignment, mistakes, and structural risks — and why DeepMind chose to focus on just two. In this

APR Ep03: Fighting Misuse
How do you stop people from weaponizing AI? Alex and Thuy dig into the paper's layered approach to misuse prevention — from threat modeling to capability evaluations to red teaming. In this episode:

APR Ep04: The Alignment Problem
When AI itself goes rogue. Alex and Thuy tackle the hardest chapter — what happens when you can't tell if your AI's output is good or bad, and the techniques being developed to address it. In this ep

APR Ep05: Robust Training and System-Level Security
The second line of defense — what happens when alignment fails. Alex and Thuy explore how to treat AI as an untrusted insider and build containment that works even against adversarial models. In this

APR Ep06: The Safety Toolbox
Interpretability, uncertainty estimation, and safer design patterns — the cross-cutting tools that make every defense layer work better. In this episode: - Interpretability — Three transformative ap

APR Ep07: Making the Safety Case
How do you decide whether an AGI system is safe enough to deploy? Alex and Thuy walk through the paper's four types of safety cases and the role of red teaming. In this episode: - Inability cases —

APR Ep08: Conclusion — Key Takeaways and Open Questions
Alex and Thuy wrap up the series with key takeaways, surprises, practical advice, and the open questions that keep them up at night. In this episode: - Defense in depth — The philosophy that runs th

APR Ep09: Measurement and Evaluation
Alex and Thuy dive into Anthropic's recommendations for technical AI safety research, starting with the foundational challenges of measuring what AI systems can do (capabilities evaluation) and whethe

APR Ep10: Control and Oversight
Continuing Anthropic's AI safety research directions, Alex and Thuy explore how to deploy potentially misaligned systems safely and how to oversee systems that might be smarter than human evaluators.

APR Ep11: Robustness and Future Directions
The final episode on Anthropic's AI safety research agenda covers adversarial robustness, unlearning dangerous capabilities, multi-agent governance, and synthesizes key themes for building safe AI sys