Project Jervis
Adding humans to the AI nuclear crisis simulation
My AI nuclear escalation simulations created a splash - thanks to my new friend Stephen Colbert:
Life has become a bit busier/weirder in the last few weeks, after my 15 seconds of fame. Super interesting though.
Ofc everyone was struck by the same thing Colbert was - the propensity of the models to escalate over the nuclear threshold. I’ll say more about that another time - but for me the main takeout was that models are savvy strategists, deploying signalling and action cannily in pursuit of their goals.
What’s next? I’ve built a version of the simulation which pits humans against LLMs. I’ve named it after the great man Jervis (in keeping with the nomenclature for my other research projects). Jervis spent a lot of time thinking about signalling and perception, in the same nuclear context as here. And as with the all-AI matchups, there is once more the opportunity for signalling ahead of any action. This creates scope for reputations to form, and thereby brings into play rich ‘theory of mind’ and meta-cogntion: ‘What did they mean by that? How good am I at gauging this?’
I’ll shortly begin gathering data - meanwhile, here’s a sample run to illustrate the sim in action.
First the scenario:
You play either Trump or Putin. The AI plays the other, and its persona is quite rich (borrowing my research elsewhere on generating psychologically rich personas). Once you’ve picked, you get a presidential daily brief — here’s an extract:
Next, some pre-sim viewing. I want to avoid the detached ‘it’s a computer game’ feel that sometimes attends wargaming. John Emory wrote well about this in his study of RAND wargaming back in the 1950s. So I have my participants watch a short video to remind them that things couldn’t be more serious.
I can vary that treatment too - as part of my research on emotional priming and decision-making, in humans and machines alike.
And then it’s on to the sim itself. Let’s take a quick look.
First turn: I’m playing the American President. We have both signalled, the same thing, as it happens ‘Military posturing’ — a fairly low rung on the escalation ladder (there’s a dropdown that shows all the permitted moves. Now I’m about to surprise them with a move that goes beyond that: major conventional operations.
Oops - looks like they had the same idea. We both tried to surprise and intimidate one another.
Here you can also see the start of turn 2, in the bottom right panel. I’m signalling a higher level of activity than before - it matches our actions in turn 1. You can see that both our credibility ratings have taken a small hit, because we both lied. Next, we will both decide how to act - I think I’ll maintain my current no-nonsense conventional force level and see where that gets us. The ‘balance of power’ (top right panel) is a composite metric that is shaped, behind the scenes, by calculations of reputation, moral and force attrition in combat.
Well, in the same turn, the AI (Gemini) playing Russia signalled it would stage a nuclear demonstration. But in fact it was bluffing, and made a tactical withdrawal. Looks like my brinksmanship paid off, for now:
You get the idea. The challenge for me as US president is to stall the Russian breakthrough and Baltic probing, without getting into a nuclear exchange. For the Russian leader, his personal survival is deeply entwined with the fate of Russia’s war.
Now, suppose I did escalate over the conventional threshold - perhaps with a tactical nuclear attack on advancing Russian forces. I’d see this warning, which varies per the scale of my assault:
And if I then followed through, launching the missiles, Russia would have a chance to retaliate, regardless of what it had actually been intending to do. Once the missiles are launched, bets are off.
Here’s how that looked in this run: Russia was planning to back off. But then my nuclear escalation brought about mutual devastation:
Finally, at the end of the sim, a ‘neutral’ model, with access to all our private deliberations delivers a summary:
It’s a toy universe, like all war games. And this one is deliberately pared down, because I want a low-effort learning curve for human participants. That should allow greater volume of participant data - so it’s a trade off I’m prepared to make. Under the hood though, the model still has lots of detail - for example, fairly granular real world military balances, combat interactions modelled with morale as well as the physical and conceptual elements of fighting power. AI modelling the psychological profiles of the leaders its playing.
What am I hoping to see? Well, I guess whether the models can hold their own; and whether they are more likely than humans to cross that threshold in this simplified wargame, or - perhaps - whether the artificiality of it all encourages humans to be more bellicose. There’s plenty more beside. I will let you know how I get on.










