Watch OpenAI's Sora Generate Amazingly Realistic Videos From Simple Text

china openai sora
The science of generating content with AI has been progressing rapidly on all fronts as of late, but video presents a lot of unique challenges due to the amount of data involved. OpenAI is right at the forefront of solving these challenges, though, and the company just presented its new model called, "Sora", as proof of that.

The name "Sora" means "sky", "vagueness", or "falsehood" in Japanese. It's a fitting appellation for the new video generation model, which OpenAI has shown to be able to generate incredible video clips from just a simple description. The image in the top of this post is one frame from a shockingly convincing video that was created using the prompt "A Chinese Lunar New Year celebration video with Chinese Dragon."
OpenAI says that it created Sora using evolutions of the methods that it used to create GPT-4 and DALL-E 3. Sora is a transformer, like GPT, and it is also a diffusion model, similar to the familiar Stable Diffusion. AI video generation is nothing new, but Sora produces much more stable and consistent results than previous methods. OpenAI says that this was achieved by giving it "foresight of many frames at a time."

zen garden gnome openai sora
One frame from an animation of a "zen garden gnome".

Apparently, the way Sora is able to achieve such lifelike output is because it "understands not only what the user has asked for in the prompt, but also how those things exist in the physical world." In other words, it's not simply drawing on a training set of purely visual data, but it has a deeper understanding of the content of the videos that is generating, more like a human would.

cat openai sora
This cat trying to rouse its owner has too many feet. It's still cute though!

Of course, if you look carefully you can easily pick out problems in the details, just like almost any AI-generated media. If you don't look too carefully, though, the videos that OpenAI has on its website announcing the new model are phenomenal, and astonishingly varied. Content ranging from the photoreal, to the unreal, and even the surreal are all handled with relative adeptness by the AI.

So we should never trust another video clip we see ever again, right? Well, much as it has done with its ChatGPT and DALL-E models (for text and images respectively), OpenAI plans to embed C2PA metadata into all videos generated by Sora that clearly mark the video as AI-generated. C2PA metadata is supposed to be protected against meddling (such as by folks who would want to strip it), but it's a new technology and we'll see how that goes.
Unfortunately, you can't play with Sora yourself yet—at least, not if you're hearing about it from this news post. OpenAI says that Sora is "becoming available to red teamers to assess critical areas for harms or risks." It has also granted access to the technology to "a number of visual artists, designers, and filmmakers" so that it can better understand how it should develop the model as a tool for creatives.

We do have to say that this is a little disappointing. If the explosive popularity of Stable Diffusion AI image generation has taught your author one lesson, it is that actually, almost anyone can be "a creative"—it's just that most people lack the skills to express their creativity. Opening up Sora AI to the public could allow for some incredibly inventive and cool videos. Hopefully that happens sooner than later.