
Episode 739 min
APR Ep07: Making the Safety Case
Chapters
Show Notes
How do you decide whether an AGI system is safe enough to deploy? Alex and Thuy walk through the paper's four types of safety cases and the role of red teaming.
In this episode:
- Inability cases — The model simply can't do the dangerous thing (like PCI compliance scans)
- Control cases — The system around the model catches harmful outputs (like SOC2 controls)
- Incentive-based cases — Good training process implies good model (necessary but not sufficient)
- Understanding-based cases — Direct internal inspection (the aspirational endgame, still far away)
- Composite cases — Layered assurance combining multiple evidence types
- Red teaming — White-box adversarial testing with loosened thresholds, building scientific evidence
- The collusion vulnerability — When monitors share blind spots with the systems they oversee
- Practical advice — How to build safety cases at any organizational scale