OpenAI President Shares The First Image Created By GPT-4o

hero gpt4o first image

In case you missed all the buzz earlier this week, OpenAI just revealed its next-generation AI model, known as GPT-4o. The "o" stands for "Omni", and it represents not the terrifying omniscience of the model but rather its capability to natively support multiple different types of input. This is quite novel; historically multimodality for large language models meant converting all input to text using other, intermediary AI models.

Naturally, as it can accept text, images, and audio as input, it can also create these things. What we have in the top of this post is in fact not a real photograph, but rather the very first image to be revealed to the public as being created by GPT-4o. (You can click it to see the full version.) It depicts a man in an OpenAI T-shirt writing on a chalkboard that says "Transfer between Modalities" at the top, with the middle text clearly and correctly written:

Suppose we directly model P(text, pixels, sound) with one big autoregressive transformer. What are the pros and cons?
There are still a few telltale hints that the image is AI-generated; the chalkboard is strangely uneven and the model struggled with the idea of multiple layered chalkboards. The man's hand is also kind of oddly shaped, and the lighting is inconsistent across the image. However, the ability to create a long string of coherent text with no real errors is actually incredible for a model like this. Even the amazing DALL-E 3 struggles with this task.

The image originates with OpenAI's President & Co-Founder Greg Brockman, who tweeted it yesterday. GPT-4o's generative capabilities with regard to images and audio aren't available to the public yet—the GPT-4o preview in ChatGPT Plus right now still uses DALL-E 3 for image generation—but Brockman says that his team is "working hard to bring those to the world." It will be fascinating to see what people create using the new tool.