Back to All Events

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments (Chris Schnabl, Pivotal Labs & University of Cambridge)

  • Lightfoot Room, Old Divinity School 2 All Saints Passage Cambridge, England, CB2 3LS United Kingdom (map)

As AI systems shift from static predictors to increasingly autonomous agents, the challenge is not only how they act, but how we can trust and govern those actions. My recent work on Attestable Audits (ICML 2025) introduces a method for making AI evaluations verifiable: using secure hardware, we can cryptographically guarantee that a model’s safety benchmarks are genuine and tamper-proof. This creates the foundation for external oversight that does not rely solely on developer claims.

Looking ahead, the same principles must extend from models to agents. Future systems will plan, delegate, and interact in multi-agent environments which makes accountability more complex, but also more urgent. I will outline how cryptographic attestations, agent identity, and provenance tracking can provide new governance tools, ensuring that agents remain transparent, monitorable, and aligned with human values. By embedding verifiability at the infrastructure level, we can move toward a framework of accountable agency that supports both innovation and safety in the age of autonomous AI.

Previous
Previous
8 November

Stress Testing Deliberative Alignment for Anti-Scheming Training (Marius Hobbhahn, Apollo Research)

Next
Next
19 November

The AI Agent Index: What Deployments Reveal About Safety and Autonomy (Leon Staufer, MATS & University of Cambridge)