I’m writing a lecture series on the causes of war. I thought it would be a good opportunity to try something new – using AI to analyse big data. My tool was the new code interpreter function of GPT-4, which uses language models to generate and understand code in Python. It’s a sort of translation service between natural language and computer language. But it also (and more relevant to us here) doubles as a powerful analytical tool. And it’s just become available in beta to OpenAI’s subscribers - including me.
Could I learn something new about the causes of war from it? I could.
We set to work. I told it to combine the data in one of the largest and longest established databases, the Correlates of War project, with a series of other large datasets, some very large indeed. I wanted to test a series of interesting hypotheses – some supported in existing literature, others that I cooked up myself.
A caveat, before we start: combining datasets is a faff, even with the help of AI. There’s lots of cleaning and tweaking needed to make the entries compatible. Does the machine understand that USA and America are the same entry? It does, mostly, but not always. Another caveat – there’s some alchemy going on under the bonnet – it’s not always obvious what the AI is doing, even though it attempts to explain as it goes along. If you regenerate an answer that didn’t work out, sometimes it will adopt an entirely different approach to answering your query.
I’ll spare you the blow-by-blow of our conversations, and briefly summarise one of my efforts, via some screen grabs.
First, I wanted to dig into the connection between regime type and war. Do democracies fight one another? Do authoritarian regimes start more wars than democracies, and who wins more often? Is there a pattern in how long the war lasts depends on who initiates it? Does going to war lead to changes in regime type? To unpick these questions, I combined the COW database with Polity5, a database from the Centre for Systemic Peace, recommended by the AI. This database assigns states a score based on how representative their government is. The score, of course, changes through time, meaning you might not end the war with the government you started with. Here’s a snapshot of how the analysis went:
So I uploaded that, and the AI set to work:
Fine – it cleaned the data, stripped out the wars with special coding, and got some answers to my queries, like this one – do democracies take longer to win their wars? A: yes!
And this one – how war prone are the two regime types? A: democracies are more war prone – a finding I didn’t expect.
Fascinating. We were on a roll. So I uploaded some data from the UNDP on gender inequality by country, and combined that with the COW database. Here’s the result:
The finding here is that more gender equal countries initiate more wars. Perhaps unexpected given what I just learned about democracies initiating wars. Liberal interventionism in action! A huge caveat – it’s a very small sample.
Next, I wanted to do some checking on the ‘suffragist peace theory’ the idea that the underlying mechanism of the democratic peace theory is female emancipation – i.e. similarly emancipated countries don’t fight one another. That means combining three datasets – COW, Polity (on regime type) and one for gender equality that goes back further than the UN data. Let’s just say that’s work in progress…
Last thing, for now – I wanted to take a look at the relationship between young males and war. It’s often conjectured that countries with an imbalance of genders are more war prone (presumably because there’s a lot of frustrated, risk-acceptant single men?); and, similarly, that countries with proportionately more young males than old are more war prone. I took a look, loading up the UN world population database, from 1950 onwards. That gave me an initial, counterintuitive, finding:
Intriguing – but to really dig in, I needed to extract chronological data on population pyramids, and marry these to the COW. That’s a vast amount of data, and at that point, fatigue set in for me, though not for GPT-4…. There are, though, a tonne more questions that I would want to answer – for example, what’s the effect of war on populations, and does that map in any significant way onto trends in regime type?
So, what have I learned? Data science is fun with AI – you don’t need advanced quantitative skills (thankfully, since I’ve forgotten all mine); you do need an understanding of the literature, and some curiosity to generate hypotheses – either on your own, or with the machine’s helpful suggestions for interesting patterns to investigate. You do need some patience – because the data are not immediately compatible. But it’s fun, sort of, watching the machine work through the challenges at superfast pace. And lastly, even though it explains its working, you would certainly need to take a very close look at the data to establish the reliability of the AI’s findings.
Overall, there’s no doubt that transformer AIs will change social science research, in a more profound way than generating fluent, plausible prose. And I would expect better data tools to arrive imminently. The code transformer wasn’t really designed for what I’m using it for, and it’s still in beta anyway. Interesting times.