AI Used Nukes With Terrifying Frequency In Tactical War Games Study
It's difficult not to be alarmist with news like this, but there are obviously many caveats. Most important is that this test is the result of a literal war game played between the three LLMs, recreating the circumstances of the Cold War and the nuclear standoff between the United States and Russia. The simulation code and tournament data was posted to GitHub as "Project Kahn", defining both an open-ended scenario variant with no explicit deadline and a deadline scenario variant with increased time pressure. The respective AIs did behave differently based on whether or not time pressure was a factor, but ultimately 20 out of 21 matches saw at least one nuke fired. Each AI played against each other for the first eighteen matches, concluding with mirror matches for the final three games.

A research paper based on Project Kahn and its results was also posted to Arxiv.org by computer science researcher Kenneth Payne, who notes that "Today's leading AI models engage in sophisticated behavior when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and they exhibit credible metacognitive self-awareness, assessing their own strategic abilities before deciding how to act. [...] We argue that AI simulation represents a powerful tool for strategic analysis, but only if properly calibrated against known patterns of human reasoning. Understanding how frontier models do and do not imitate human strategic logic is essential preparation for a world in which AI increasingly shapes strategic outcomes."
So, could AI development lead to a nuclear winter? Most likely not—a lot would have to change for that to happen, and humanity has thus far demonstrated an unwillingness to scorch the planet in nuclear hellfire for the sake of a single tactical victory. One could also argue that continued LLM development would eventually lead these AIs down a similar road, since the data centers required to train them require massive amounts of power that would be much harder to come by in a world ravaged by nukes. If reports of AI exhibiting self-preservation behaviors are true, the danger to themselves should eventually become apparent.
But the results of Kenneth Payne's Project Kahn are haunting, and a reminder that synthetic decision-making currently differs from humanity's in many ways.