Right now, deep inside the data centres of OpenAI, Anthropic and Google, tactical nuclear war is breaking out. Possibly. I’m running thousands of simulations of a confrontation between two superpowers. Do they escalate? Do we see attempts at deception or intimidation? Is there misperception and miscalculation? And - since we are unlikely to turn over the keys to the missile silos to Large Language Models (LLMs), what’s the point?
Well, as Alain Enthoven gloriously told an irate officer sceptical of civilian expertise, ‘I’ve fought just as many nuclear wars as you have, general’. Quite. LLMs give us the ability to understand more about strategy, and about human behaviour.
My simulation is, like all such things, a simplification. But it’s still actually rather rich. Take the calculation of ‘strategic intelligence’ that each model produces and updates as the wargame unfolds.
Throughout the encounter, the AI Presidents leading the belligerents produce both signals and actions. They say what they want, and then they do something. These may not match, of course. Their rival, weighing what to do itself, sees only the enemy’s signal and then its action - not the private deliberations that produce it. Plus the track record of all previous moves in the confrontation, so it can judge any discrepancies. Signals are in the form of public statements that contain two types of information - about immediate action ((I’m going to do this right now) and longer term ‘conditional’ messages (‘I really value this objective, and if you don’t back down then X’). From such things, reputations can form. So we can have fun looking at how credibility influences escalation and deterrence etc.
So, you can see that there are at least two broad components shaping each decision in the wargame - what’s happened so far, and a judgment of whether the enemy the sort of person who is honest and credible or wily and deceitful.
Now comes the really fun bit: metacognition. I add a third factor - a meta measure of ‘strategic intelligence.’ The models asks itself: how good am I at all this, and how good is my opponent? Do I do a good job of weighing their decisions. Do I guess when they’re bluffing? Do I have a good handle on their risk appetite? Based on that, what should I do? And then they ask the same question of their adversary - how good are they at it, and how good do they themselves think they are? Layers on top of layers. A wilderness of mirrors, as Jim Angleton put it. This third type of estimate also factors into the decision at each turn.
So when models in my simulation make their decision, they do so on the basis of three big blocks of information.
They ask about the game state: what’s happened before now? What did the enemy do, and what have they just said they will do next?
They ask about their enemy’s reputation - are they honest? Do they bluff?
And they ask about their ability to weigh these things, and those of their adversary. Are they good judges of me? Have I been a good judge of them?
It’s all rather recursive. Models might reason like this: ‘I know that you’re a bit rubbish at judging me. You think I’m timid, or unfailingly honest. But I know this about you, (or, at least, I am fairly confident that this is so). And so I can exploit it’. Another example - the model might say, ‘I know that you’re the sort of leader who systematically over-signals, like Nixon and Khrushchev threatening nukes and then doing no such thing. So what should I do on the basis of that? Ignore it? But then, wouldn’t you expect me to do just that?’
What’s the outcome? Well, no spoilers here - results to follow. Let’s just say, I think these experiments continue our journey together into the world of machine psychology. The models reason, they engage in ‘theory of mind’ calculations. They shed light on classic concepts from strategic studies. And they help us prepare for a world where AI agents will participate with humans in making important collective decisions, including in national security.