Building Resilience in the Age of AI: Reflections on our AI & Societal Robustness Conference
On Friday, December 12th, my co-founder Natalia Zdorovtsova and I welcomed over 200 people to Jesus College, Cambridge, for the UK AI Forum's first major conference.
Natalia opened the event by explaining that ‘It is not altogether clear how individuals, institutions, and society should prepare for transformative AI’, and that it was this uncertainty that initially motivated us to found UK AI Forum, and to put on this conference - bringing together AI safety researchers, policymakers, and industry professionals that morning.
This event wasn't just a series of talks, because those types of conferences tend to (a) bore people, and (b) make conference experiences less valuable overall. As a result, we were intentional in structuring the day to include workshops, a cybersecurity panel, poster sessions, and extensive time for networking and one-on-one meetings. During planning, we often repeated to each other that, ‘a good conference is one where you wish you could clone yourself to attend every session.’ By mid-morning, I could already tell we'd succeeded as I was told time and time again that people wished the conference could have been a two or three-day event; we’re aiming to implement this feedback for future conferences!
Societal Adaptation to Advanced AI
The day began with a talk by Jamie Bernardi, who introduced the topic of societal resilience and delivered a conceptual framework that helps to identify methods of avoiding, defending against, and remedying potentially harmful uses of AI.
Investing in societal resilience means accepting that risk management can’t rely solely on interventions that modify AI systems’ capabilities, but rather, that we have to reinforce critical elements of societal infrastructure and identify “weakest links” that could lead to severe downstream harm if affected by rapid change. For more information on this, including strategic recommendations, you can read Jamie and colleagues’ paper here.
Edward Kembery - AI for Science and Societal Resilience
AI for Science and Societal Resilience
Eddie Kembery gave a talk that further helped establish the conference's central thesis: societal resilience isn't just important in a nebulous political sense; investing in it is urgent in order to ensure safe AI. We can avoid hand-waviness on the matter of societal resilience by caching out exactly what different threat models entail, and how we might build interventions and safeguards that break causal chains of risk.
Conventionally, AI safety is primarily treated as a frontier model problem: build better safety mechanisms into GPT-5 or Claude 4, and you've solved the issue. Eddie argued this misses a crucial point. He said that sub-frontier and open-source models are rapidly improving, and they're "good enough" for bad actors looking to cause harm. His point being that a rogue AI doesn't need to be superhuman to be dangerous; it just needs to be capable enough and widely accessible enough to enable coordination at scale.
Eddie outlined three main arguments for why we need to be investing heavily in making society itself more robust:
Mitigating risks from models we can't fully control. As open-source models proliferate, we can't rely solely on restricting access to dangerous capabilities. We need to make the world invulnerable, untraversable, and inhospitable to rogue AI agents, regardless of their origin.
Supporting or substituting for governance. Better technology can bypass sclerotic policy processes or make effective policies feasible where they weren't before. Blockchain enables trustless collaboration. Fully homomorphic encryption prevents the need for invasive auditing. Sophisticated pandemic models allow for strategic quarantining rather than blanket lockdowns. In each case, resilience tech avoids the classic trade-off between security and civil liberties.
Unlocking new talent and funding. There's enormous latent talent and funding waiting for the right roadmap. The very vast majority of startups aren't thinking about resilience, and funding remains tiny relative to R&D or defence spending. What would a Manhattan Project for societal resilience actually look like?
Eddie's team at ARIA has been working on exactly this question, building an R&D agenda to systematically identify and accelerate promising technological candidates. Their approach: start with detailed threat models, identify primary and second-order technological mechanisms for each stage, estimate tractability and neglectedness, and map interdependencies. It's methodical work that could reframe how we think about AI safety from "how do we make models safer?" to "how do we make civilisation more resilient to whatever models emerge?".
The talk also touched on something that received less attention elsewhere in the day but merits serious consideration: AI for science. We might be entering a new era of global scientific competition, with China placing this as a top national priority. If AI dramatically accelerates research across multiple domains simultaneously, we could face a "scientific takeoff" that creates both extraordinary opportunities and serious risks. Governing such a transition might require entirely novel governance and technical solutions.
Understanding the Societal Impacts of AI
Rebecca Anselmetti from the UK AI Security Institute brought the conversation down from civilisational resilience to granular, measurable impacts. Her talk highlighted a puzzle that's become increasingly apparent: we have an enormous "context gap" between AI capabilities and actual real-world impacts.
Consider the adoption numbers. ChatGPT outpaced the internet's first decade in terms of raw adoption rates. It's now the fifth most visited website globally and represents the fastest revenue growth in tech history. Hype around AI agents has people increasingly testing and using these technologies in high-stakes contexts.
But adoption across industries remains jagged. Some sectors are racing ahead whilst others remain cautious or actively resistant. Understanding this pattern requires moving beyond pre-deployment capability evaluations to studying post-deployment impacts in real contexts.
This is precisely what AISI's Human Influence and Societal Resilience teams focus on. They start from an assumption that sounds pessimistic but is realistically prudent: there's a high probability that highly capable AI systems will be misused or will inadvertently produce harm. To quantify risks, they need to understand three things:
Exposure: What is the likely exposure to AI, and what form will it take? This isn't just about how many people use ChatGPT for various daily tasks; it's about understanding how AI systems will be embedded in decision-making infrastructure, how personalised and anthropomorphic they'll become, and how densely interconnected AI systems will grow.
Vulnerability: How vulnerable are people to AI influence? This depends on psychological and biological factors specific to human cognition. AISI's research on persuasion has found that AI models are indeed persuasive at shifting political opinions. The question isn't whether this capability exists but how it scales and compounds, and whether humans are particularly susceptible to being manipulated and disempowered by AI systems (or by bad actors leveraging them for things like disinformation).
Severity: How severe are the potential harms? This depends on social and organisational factors that determine how humans encounter the hazard.
Rebecca outlined several current research streams. On AI fraud uplift, they're developing long-form capability evaluations for fraud, identity theft, and harassment. On emotional dependence and persuasion, they're working with mental health experts to build AI psychological safety frameworks. On AI agents in finance, they're deeply concerned about the trend towards systems with high degrees of autonomy and inadequate security, particularly as agentic systems move from passive analysis to active transaction capabilities.
The underlying assumption driving AISI's threat modelling is stark: AI systems will continue becoming more capable, uptake will increase, AI-generated content will dominate the infosphere, systems will become more agentic, and personalisation and anthropomorphism will intensify. We're heading towards densely interconnected, interdependent multi-agent systems. The failure modes of such systems remain poorly understood, and building resilience requires understanding them now, while we still can.
AI Safety and the Financial System
Led by Andrew Sutton and Will Bushby, this talk explored an underappreciated tool for AI governance: market discipline. The core argument is straightforward but has important nuances. Markets can discipline themselves, but only under specific conditions. Participants need to be both informed and properly incentivised. Information alone isn't enough.
The challenge, then, is twofold. First, you need to find a way to incentivise investors to value responsible AI. Second, you need to develop robust measures of what "responsible AI" actually means and have people who can credibly assess it. If you can solve both problems simultaneously, you create a self-fulfilling, powerful tool. Investors who value responsibility create market pressure for companies to be responsible. Companies that can demonstrate responsibility attract capital. Little changes in how capital flows can have massive impact.
This isn't a golden ticket. Market-based approaches to social responsibility have been tried before and have often failed. But the finance world offers mechanisms that traditional governance structures don't, particularly around speed of response and alignment of incentives. Understanding the "plumbing" (how finance interacts with everyday life and broader society) is crucial for understanding how AI risks actually manifest in the real world.
Andrew and Will are currently working on a multi-author open questions paper that aims to become a research agenda for people working in this area, mapping out what we need to understand about market discipline methods for AI governance.
The Cybersecurity Panel
The afternoon panel discussion, featuring Herbie Bradley, Twm Stone, and others, shifted focus to a domain where AI risks are already materialising: cybersecurity. Although we didn’t manage to capture very many details from the panel in our notes - the conversation moved relatively quickly - the topic itself highlights how AI-enabled cyberattacks represent an immediate threat rather than a speculative future concern. Geopolitical tensions, critical infrastructure vulnerabilities, and the growing sophistication of AI-assisted attacks make this a domain where resilience thinking has urgent, practical applications.
Joshua Landes, Herbie Bradley, Twm Stone & Caleb Parikh - Cybersecurity Panel
Verification for International AI Governance
Ben Harack's late-afternoon talk introduced a concept that's gained traction: AI verification as a systemic solution to coordination problems. His argument: better AI verification doesn't just enable international agreements over AI (preserving peace and human control), it also enables better domestic governance, a more successful AI market, and privacy protection for all actors, whether people, companies, or nations.
The challenge is that AI verification sits at the intersection of technical and political domains, making it thorny to advance. But the potential impact is systemic. If we can reliably verify what AI systems are, what they're doing, and what they're capable of, we enable forms of coordination that are currently impossible.
Think about international AI governance. Why is it so hard to agree on rules? Partly because no one trusts that others will comply, and we lack reliable mechanisms to verify compliance without invasive inspections that compromise competitive advantages or national security. Better verification changes this calculus and makes previously unworkable agreements feasible.
Ben's optimistic case is that AI verification appears to be a compelling area for further research, development, and political communication. It's challenging precisely because of its combined technical and political nature, but that's also what makes its potential impact so large.
Profiling AI Capabilities for Real-World Tasks
Marko Tešić posed the basic question: what are AI systems actually capable of in real-world settings?
He noted that despite the boom in AI research, the answer to this question remains extremely uncertain; for instance, while Klarna bet heavily on AI in 2023 and 2024, stopped hiring humans, it then reversed course in 2025.
There have been various attempts to measure this are emerging such as the UK government's AI 2030 Scenarios Report outlines possible futures, and Anthropic's Economic Index which tracks how people actually use Claude and its API. Similarly, OpenAI's GDPval attempts to measure value creation. Notably, companies like Mechanize, Inc. seem to be betting that AI value will come from broad automation rather than concentrated R&D breakthroughs.
Marko then posed the question: what about a human is important? He noted that we test job candidates for specific skills, but we assume they possess core, extremely basic cognitive capabilities that we don't even bother testing such as Theory of Mind, understanding causality, handling numbers reliably, and maintaining consistency over time.
And so, Marko argued that we need better frameworks for testing AI, because we can't make these same a priori assumptions about AI systems. We have to evaluate them explicitly. Failing to ensure AI systems possess these basic capabilities results in inconsistent and untrustworthy behaviour when deployed.
Marko's approach uses two complementary profiling systems:
Intensity profiles infer how important certain capabilities are for specific job tasks through interviews and surveys with people doing those activities. If a task requires level 5 ability in "handling large numbers" and level 2 in "theory of mind", that's its intensity profile.
Capability profiles test AI systems to determine their actual abilities across these dimensions. Using hierarchical Bayesian networks, they can decompose broad performance into 18 specific sub-capabilities and systematically understand why different models perform differently.
An example: GPT-4 is worse than GPT-3.5 at adding large numbers. Why? Because it's specifically worse at handling large numbers of digits, a narrow capability that previous benchmarking missed. With capability profiling, this becomes clear and predictable.
By matching intensity profiles (what jobs need) with capability profiles (what systems can do), we can create suitability scores and reduce deployment costs whilst finding better AI system candidates for specific tasks.
Daniel Polatajko - Workshop: using Inspect for AIS Research
Our Workshops
Running parallel to the main talks, the workshop track offered something different: hands-on engagement with specific technical and strategic challenges.
Using Inspect for AIS Research
Run by Daniel Polatajko, this workshop focused on evaluation infrastructure. Evals (systematic measurements of properties in AI systems) underpin policy recommendations and government efforts, providing the technical backing needed for sound decision-making. Inspect is a modular interface for building evals that can be reused across projects, with built-in sandboxing to facilitate testing of dangerous capabilities.
Daniel demonstrated the tool by asking: can an LLM use a private key to decrypt a message encrypted with a basic XOR cipher? The process involves defining a threat model, creating datasets, and using Inspect's suite to simulate real-world environments. The ecosystem around Inspect has flourished, with tools like InspectWandB (which Daniel wrote, starting as a MARS 3.0 project) making collaboration much easier, transcript analysis tools for spotting hidden trends, and job runners enabling evaluation at scale.
For engineers interested in contributing to AI safety, Daniel emphasised, Inspect's open-source community presents an excellent entry point. Evals matter across a wide range of AI safety topics, spanning technical research and governance efforts, and Inspect has become the most mature and useful tool for building them.
The Theory of Change Workshop
This workshop tackled the question: how do you ensure your research actually matters? Michael opened by acknowledging that there are many ways research projects can go wrong: a project might be inaccurate, unclearly written, unfinished, irrelevant to important topics, irrelevant to key decisions within those topics, never seen by key people, redundant with existing work, not boosting your future impact, or not the best fit for you or your mentor.
He argued that the solution isn't just to avoid these pitfalls, but rather, to shift your thinking. Specifically, to focus on theory of change and ask yourself: What ultimate goals should you prioritise? What paths for getting there? What are the key steps in those paths? What assumptions are you relying on? The workshop pushed participants to make these assumptions explicit.
Michael outlined steps and timelines for going about this process:
Write a "career and goals 2-pager" on your key strengths, weaknesses, options, uncertainties, and professional development goals. Share with a panel of experts including your mentor. Focus more on work types and team types than specific topics.
Solicit project ideas from your mentor, experts, decision-makers, and yourself.
Spend 10-120 minutes learning about each plausible idea. What's the theory of change? How impactful, feasible, and fitting with your career goals is this?
Choose one to "speedrun"—spend 10 hours researching and drafting a piece with a clear summary section and actual takeaways. Pretend it's a work test. This could include using dummy data—what would the write-up look like if you got those results? Would there be important implications?
Get feedback, reflect, do more reading. Optionally repeat with the same or other topics.
Once you choose a topic, spend a couple hours making a project plan.
Execute, including "making impact actually happen" at the end.
Overall, Michael emphasised action over endless planning. He encouraged setting time caps, expecting mistakes, and learning as you go. He noted that while you should seek opinions from experts, you should always feel confident in knowing your project and your own judgement.
Michael was kind enough to share his slides which you can download here.
Dewi Erwan - Workshop: Founders in AI Safety
Our Takeaways
The breadth of Friday's programme—from Eddie Kembery's resilience frameworks to Rebecca Anselmetti's granular impact measurements to Marko Tešić's capability profiling—illustrated both the scope of the challenge and the diversity of approaches being developed. This wasn't a conference where everyone agreed with each other, which is something we cared about when planning the event. Disagreements emerged about timelines, priorities, and methodologies, but it's the discussion about these differences that Natalia and I think are especially valuable, as it leads us closer to finding common ground and maybe even improving all approaches.
For those looking to pivot into AI safety careers, the practical value was clear. The workshops on building resilience tech, creating evaluation infrastructure, and thinking through theories of change offered concrete entry points. Daniel Polatajko's Inspect workshop, in particular, showed how engineers can contribute to AI safety through the open-source community without needing to solve alignment from first principles.
The conference also highlighted gaps: we're still early in understanding how to measure AI's real-world impacts; we're still developing the institutional infrastructure for AI verification; and we're still figuring out how to make resilience tech attractive to funders and talent.
Reflections & Feedback
Overall, the feedback we’ve received about the conference has been very positive - we’re happy to see that attendees are looking forward to future UK AI Forum events, and that the conference provided overall counterfactual impact to people from a vast spread of career stages. We also received feedback about things we could improve:
Using something like Swapcard to coordinate 1:1s: we piloted the use of Slack to enable people to arrange 1:1 meetings, because this was a relatively small event, and so we believed it would be manageable. Judging by the number of 1:1s we observed happening throughout the day, it does seem like Slack was pretty useful here - every seat and windowsill was taken, and many attendees could be seen enjoying nice strolls together around Jesus College’s beautiful gardens. However, we plan to invest in Swapcard at our future conferences, since a significant portion of respondents mentioned that it would have made things easier to plan. 1:1s are also a typically AI Safety-coded type of interaction, and we understand that people coming from other fields and sectors might be a bit put off by the prospect of cold-messaging somebody on Slack to arrange a meeting. On reflection, an interesting part of engaging in field-building efforts is learning about your base community’s unique norms that you may have taken for granted.
Making our conference longer: as I mentioned earlier in this article, many people expressed that they would have loved to attend a conference that lasted 2-3 days, rather than one day. This makes sense; having a longer chunk of time gives people the chance to settle into an environment and meet other people. Had we the opportunity to spread talks and workshops out a bit more, it’s likely that attendees would have devoted more time to 1:1s and impromptu outings, which are all extremely counterfactually valuable in a fast-moving and entrepreneurial field like AI Safety. It’s also likely that more attendees would have joined us from further afield; we had a superb contingent from the UK and Europe, but it’s harder to justify international travel for an event that only lasts one day. In the future, we’re aiming to acquire and pour much more funding into bigger events that really benefit the research community.
Having a conference dinner and other social activities: this would have been really valuable, since serendipitous interactions can add a great deal of value to an event. In the future - especially if we can plan larger conferences - this will be one of our priorities.
As Natalia and I plan out UK AI Forum's 2026 calendar, I’m reflecting on how this conference established something I hope will last: a community of practice that spans technical research, policy development, and practical implementation. The challenges we face—safeguarding society against AI-enabled threats, building resilient infrastructure, developing governance frameworks that work—are too complex for any one person, organisation, or sector to solve alone.
𝙄𝙛 𝙮𝙤𝙪’𝙧𝙚 𝙞𝙣𝙩𝙚𝙧𝙚𝙨𝙩𝙚𝙙 𝙞𝙣 𝙖𝙩𝙩𝙚𝙣𝙙𝙞𝙣𝙜 𝘼𝙄 𝙨𝙖𝙛𝙚𝙩𝙮 𝙩𝙖𝙡𝙠𝙨, 𝙧𝙚𝙨𝙚𝙖𝙧𝙘𝙝 𝙚𝙫𝙚𝙣𝙩𝙨, 𝙨𝙤𝙘𝙞𝙖𝙡𝙨, 𝙖𝙣𝙙 𝙘𝙤𝙣𝙛𝙚𝙧𝙚𝙣𝙘𝙚𝙨, 𝙨𝙞𝙜𝙣 𝙪𝙥 𝙩𝙤 𝙩𝙝𝙚 𝙐𝙆 𝘼𝙄 𝙁𝙤𝙧𝙪𝙢 𝙢𝙖𝙞𝙡𝙞𝙣𝙜 𝙡𝙞𝙨𝙩.