As AI systems shift from static predictors to increasingly autonomous agents, the challenge is not only how they act, but how we can trust and govern those actions. My recent work on Attestable Audits (ICML 2025) introduces a method for making AI evaluations verifiable: using secure hardware, we can cryptographically guarantee that a model’s safety benchmarks are genuine and tamper-proof. This creates the foundation for external oversight that does not rely solely on developer claims.
Looking ahead, the same principles must extend from models to agents. Future systems will plan, delegate, and interact in multi-agent environments which makes accountability more complex, but also more urgent. I will outline how cryptographic attestations, agent identity, and provenance tracking can provide new governance tools, ensuring that agents remain transparent, monitorable, and aligned with human values. By embedding verifiability at the infrastructure level, we can move toward a framework of accountable agency that supports both innovation and safety in the age of autonomous AI.