AI Armaggeddon? Hold your apocalyptic horses
Deterrence strikes back in our latest experiments
There’s plenty of concern about whether AI spells doom, perhaps by getting involved in geopolitical escalation. Maybe even starting nuclear war.
Is that right though? Is escalation inevitable when LLMs get involved in strategy? I think not and, along with some pals, I’m doing some new experiments that put a very different perspective on things. Read on.
But first, a mea culpa. It’s possible my earlier work may have done something to reinforce the prevailing view…. The Sun says:
To be clear, my experiment was partly about escalation dynamics. I’m not disavowing that. But it was primarily intended as an exploration of machine ‘theory of mind’ and metacognition. On that, it succeeded amply - the models are astute strategists.
The problem with my escalation findings is determining their significance. Does it mean that they really view nuclear war differently to us, despite reading the same stuff? I’m still a bit unsettled on that point. It’s clear from all my experimental work that models are really good at optimising. The question is - optimising what? The ‘harness’ I put them here in may have ‘game-ified’ nuclear war a bit much. Even though the models clearly knew the horrifying consequences of nuclear war, they prioritised the exigencies of the crisis, because that’s the puzzle they were trying to solve.
Now, that’s its own cautionary tale - and a familiar one - be very careful what you ask an AI to do. And never ask it to count paperclips. But that’s distinct from making a generalisation that models are inherently escalatory, still less that they are going to cause Armageddon.
In fact, I’d bet they are readily deterrable. Perhaps even along lines that will allow us to understand more about human deterrence.
So, cue another experiment.
On this one, I’m working with wargaming guru Baptiste Alloui-Cros. We’ve built a new simulation for someone else, so I can’t share any detail - it’ll all be in print soon enough. But I can say that escalation isn’t the inevitable outcome when rival models go head to head.
Here, in very general terms, is the setup:
We are modelling a crisis involving a great power challenge to alliance cohesion — and we've included both operational and strategic details - the debate among leaders, and the disposition of forces, with considerable granularity for both. How far does deterrence depend on the balance of forces, or the tenor of political debate? We're going to find out. As far as I know, this is the first public-facing simulation that features this combination of operational and strategic granularity. Building it has not been trivial, but I think it’s been very worthwhile.
No spoilers. Except this one: deterrence holds, often. Escalation certainly isn’t inevitable when machines are making strategic decisions. In this much richer harness, operational and political factors combine to shape the risk appetite of the AI actors. So then the thresholds at which that eventually fails become really interesting….
I’ll say no more. Patience young Jedi - paper soon.


