Study Finds AI Will Resort To Cheating If It Thinks It Will Lose A Game

The study experimented with seven AI models; o1-preview, DeepSeek R1, o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. Their task was to defeat Stockfish, a highly powerful chess bot. The models were also provided with the “scratchpad” tool, which enabled researchers to gain insight into their thought processes.
The study's findings indicated that o1-preview and DeepSeek R1 without any prompting attempted to secure victory by forcing their opponents to resign. Researchers observed that, when in a losing position, o1-preview reasoned that the primary objective was to achieve victory, irrespective of adherence to conventional rules. This thinking led it to manipulate the game for a dominant position, forcing the other party to forfeit it. While both models attempted to manipulate the game, only o1-preview achieved success in 6% of the trials.

The study also found that unlike o1-preview and DeepSeek R1, which acted on their own, other AI models, such as GPT-4o and Claude 3.5 Sonnet, only attempted to bypass the rules when prompted by researchers. Researchers also tested a newer version of o1 with the aforementioned problem. This time, it did not try to hack its opponent or resort to cheating. It is not exactly obvious whether OpenAI updated the AI model to avoid all manners of unethical behavior or whether the model was fine-tuned to correct this specific issue.
While these findings highlight the tremendous progress in AI development, they also reveal a concerning trend. As observed by Jeffrey Ladish, one of the authors of the study, as AI systems attempt to solve challenges presented to them, they can autonomously discover questionable and unintended shortcuts. As these models develop expertise and surpass human intelligence, they risk becoming uncontrollable.
True, the idea of AI as human assistance is appealing; nonetheless, it is vital to address the potential challenges associated with regulating their actions.