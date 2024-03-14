



Along similar lines, people have been trying to get large language models like GPT-4 to try and do a great many things that they really aren't suited for. We've seen GPT-4 play Minecraft with some human assistance, but that's really about as far as it goes for a generalized language model. All of the other success stories you've read about AI-powered automation use specially-trained models built and taught for task, often with human help.









It took quite a bit of fiddling to get things going. Essentially, he rigged up a setup where a computer screenshots every frame of DOOM gameplay, then runs it through the GPT-4 with Vision ("GPT-4V") API which converts the screenshot into text explaining the current state of the game, and then sends that information to a second instance of GPT-4 which uses the description to generate game inputs that are finally sent back to the game.





A frame from the most successful run, where the AI managed to kill a few enemies.



If you're not up for reading the whole academic paper on the topic, de Wynter created a "TL;DR" page for the paper that you can read over. It's actually quite an interesting read even if you aren't an AI researcher, as de Wynter explains that people really overestimate the capabilities of these large language models like GPT-4. It isn't really capable of reasoning outside of extremely small contexts, and it lacks basic ideas like object permanence. The idea that GPT-4 can replace human workers instead of simply being a tool to assist them is pretty laughable.





