Anti-Scheming: how would we train a model not to scheme? (Bronson Schoen and Marius Hobbhahn, Apollo Research)

Saturday 8 November 2025
18:00 19:00

Constance Tipper Lecture Theatre, Department of Engineering Trumpington Street Cambridge, England, CB2 1PZ United Kingdom (map)

Google Calendar ICS

Register for the event here

Bronson Schoen (senior research engineer, Apollo Research) and Marius Hobbhahn (CEO, Apollo Research) will present Apollo’s & OpenAI’s joint work: ‘Stress Testing Deliberative Alignment for Anti-Scheming Training’. They attempted to use deliberative alignment to teach o3 and o4-mini general principles to not be deceptive.

Bronson and Marius will talk about their findings, including: a) how well it works, b) effects on the situational awareness of the models, c) that the chain-of-thought of o3 is not always human-interpretable anymore, and more.

Anti-Scheming: how would we train a model not to scheme? (Bronson Schoen and Marius Hobbhahn, Apollo Research)

Register for the event here

Mission
Contact us at contact@ukaiforum.com

UK AI Forum is proud to be sponsored by Meridian Cambridge.

Anti-Scheming: how would we train a model not to scheme? (Bronson Schoen and Marius Hobbhahn, Apollo Research)

Register for the event here

Evaluating harmful agents: learnings from AgentHarm and follow-up work (Mateusz Dziemian, Gray Swan AI)

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments (Chris Schnabl, Pivotal Labs & University of Cambridge)

MissionContact us at contact@ukaiforum.com

UK AI Forum is proud to be sponsored by Meridian Cambridge.

Mission
Contact us at contact@ukaiforum.com