Swapping Anthropic for OpenAI at the DoW
A model isn't just a model
Somewhere near the start of the Cuban Missile Crisis, Bob McNamara ventured a (strange) opinion. How gravely, he was asked, does this change the strategic balance? His answer:
‘I asked the Chiefs about that this afternoon, in effect. They said “Substantially”. My own personal view is: not at all’? ‘What difference does it make,’ ruminated Kennedy soon after, ‘they’ve got enough to blow us up now anyway’? A missile is just a missile, in other words.
Pains me to say it, but the Chiefs were right. First, prestige and reputation count materially in shaping the balance of power and deterrence. And Khrushchev was aiming squarely at that by ‘throwing a hedgehog down Uncle Sam’s pants,’ as he cheerfully put it. Kennedy knew that, which is why the Soviet gamble couldn’t stand. Second, shorter range missiles had a concrete effect on the strategic balance. The USSR didn’t have very many long range ICBMs in 1962 - so few in fact that an American first strike to disarm them wasn’t entirely a forlorn hope. Diversifying their deterrent greatly complicated the American’s military challenge.
Flash forward a few hot minutes. Dateline May 2026 and the Pentagon has - for the moment - exiled Anthropic from the building, including from use in the vaunted Maven Smart System. OpenAI is happy to step into the vacuum. Other suppliers will surely be keen too. Does it matter? Isn’t a model just a model, after all?
A: yes it does, profoundly. If a missile isn’t just a missile, a model isn’t just a model. In fact, the differences are even more fundamental than with MRBMs and ICBMs. How so?
Here’s what I’ve found from well over a year of experimenting with the ‘machine psychology’ of frontier models.
the models are very handy, and getting ever better, at making the sorts of reasoned, sophisticated decisions that are the hallmarks of human strategy.
models differ from one another, both within companies and - more strikingly - between them. Gemini behaves very differently to Claude, and Claude to GPT.
I’ve worked less with open weight models, like Mistral and Llama, but my sense is the same is true here.
What specifically:
They adopt very different strategies in game theoretic encounters (I’d want Claude to bat for me here, on balance.
They vary wildly in their approaches to ‘theory of mind’ and metacognition when in stylised escalation scenarios. To summarise: Gemini is Nixon, GPT is Carter, Claude is Machiavelli.
They experience human biases and heuristics, like the framing effect - especially in military scenarios, they weigh risk differently. Here again, Gemini is erratic, GPT a bit wet, and Claude flexible.
Those are all published, but let me tease a couple that aren’t yet:
They respond differently to emotional priming. An ‘angry’ model is different from a ‘frightened’ one, with implications for decision-making under risk.
Models are ‘under confident’ in scenarios designed to weigh the strategic advantages of over-confidence. But when allowed to adjust their confidence in flight, they perform well against a range of non-LLM actors. Guess which one does best?
Now, these models probably aren’t being used to determine strategy just yet, only operations. (Ask Claude what it thinks about the strategic wisdom of bombing Iran to see that doing so might not be a bad idea). But they evidently are embedded in intelligence and targeting systems. So its really important to understand how they go about weighing decisions under uncertainty.
I’ve no idea why Claude is so savvy, and in successive generations too. I suspect the peacenik inclinations of GPT might owe to RLHF - reinforcement learning from human feedback. But what I can say, with reasonable confidence right now is that a model isn’t just a model, and swapping one for the other in military decision-making will have consequences. Would you swap Machiavelli for Carter? Bare minimum you should do so knowingly.


