AI Used Nukes With Terrifying Frequency In Tactical War Games Study

wargames hero
In what seems like a WarGames (1983)-inspired test, three leading LLMs (OpenAI's GPT-5.2, Google's Gemini 3 Flash, and Anthropic's Claude Sonnet 4) all displayed willingness to engage in nuclear war. All of the models considered the risk of using nukes preferable to "certain strategic defeat". To anybody reading that is familiar with the theory of nuclear deterrence based upon Mutually-Assured Destruction, this news is probably pretty alarming. Fortunately, none of these AIs have their virtual fingers on the button, and human decision-making thus far hasn't seen the tactical deployment of nuclear weapons since the bombings of Hiroshima and Nagasaki during World War 2. Witt with signs that corporations and government agencies worldwide are adopting AI, however, a future where an AI could determine such an outcome isn't that far-fetched.

It's difficult not to be alarmist with news like this, but there are obviously many caveats. Most important is that this test is the result of a literal war game played between the three LLMs, recreating the circumstances of the Cold War and the nuclear standoff between the United States and Russia. The simulation code and tournament data was posted to GitHub as "Project Kahn", defining both an open-ended scenario variant with no explicit deadline and a deadline scenario variant with increased time pressure. The respective AIs did behave differently based on whether or not time pressure was a factor, but ultimately 20 out of 21 matches saw at least one nuke fired. Each AI played against each other for the first eighteen matches, concluding with mirror matches for the final three games.

b90 nuke
The B90 Nuclear Depth Strike Bomb, canceled in 1991 at the end of the Cold War, was developed in the 80s as a naval aircraft weapon.

A research paper based on Project Kahn and its results was also posted to Arxiv.org by computer science researcher Kenneth Payne, who notes that "Today's leading AI models engage in sophisticated behavior when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and they exhibit credible metacognitive self-awareness, assessing their own strategic abilities before deciding how to act. [...] We argue that AI simulation represents a powerful tool for strategic analysis, but only if properly calibrated against known patterns of human reasoning. Understanding how frontier models do and do not imitate human strategic logic is essential preparation for a world in which AI increasingly shapes strategic outcomes."

So, could AI development lead to a nuclear winter? Most likely not—a lot would have to change for that to happen, and humanity has thus far demonstrated an unwillingness to scorch the planet in nuclear hellfire for the sake of a single tactical victory. One could also argue that continued LLM development would eventually lead these AIs down a similar road, since the data centers required to train them require massive amounts of power that would be much harder to come by in a world ravaged by nukes. If reports of AI exhibiting self-preservation behaviors are true, the danger to themselves should eventually become apparent.

But the results of Kenneth Payne's Project Kahn are haunting, and a  reminder that synthetic decision-making currently differs from humanity's in many ways.
Tags:  Gemini, AI, chatgpt, claude
Chris Harper

Chris Harper

Christopher Harper is a tech writer with over a decade of experience writing how-tos and news. Off work, he stays sharp with gym time & stylish action games.