NVIDIA NeRF AI Renders Amazingly Realistic 3D Scenes From 2D Photos In Just Milliseconds

NVIDIA NeRF view
It takes a human being around 0.1 to 0.4 seconds to blink. In even less time, an AI-based inverse rendering process developed by NVIDIA can generate a realistic three-dimensional scene from a series of two-dimensional photographs taken from different angles.

NVIDIA's approach is based on a new technology called neural radiance fields (NeRF, for short) and is the fastest of its kind. NeRFs work their rendering magic by employing neural networks to essentially fill in the blanks from a few dozen images of a scene, along with information about the camera position from each of the shot.

Using that information, the neural network predicts the color of light radiating in any position, from any point in 3D space, NVIDIA explains. It even works when some images are blocked from view by obstructions such as pillars in some of the images.

"If traditional 3D representations like polygonal meshes are akin to vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene," says David Luebke, vice president for graphics research at NVIDIA. "In that sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography—vastly increasing the speed, ease and reach of 3D capture and sharing."

To demonstrate the technique and pay homage to the days of Polaroid, NVIDIA tasked its NeRF technique with recreating a popular photo of Andy Warhol taking an instant picture. Have a look...


The speed is equally impressive as the output. NVIDIA's specific NeRF technique achieves speedups of more than 1,000x in some cases. And while the actual rendering is done in the blink of an eye (or blink of an AI, as NVIDIA cleverly puts it), the training step doesn't take much longer—just a few seconds to train on a few dozen photos.

This is a massive improvement over earlier NeRFs, which took hours to train and minutes to render a scene without artifacts. How was NVIDIA able to speed things up so dramatically? The company says it used a technique called multi-resolution hash grid encoding, which is optimized to run efficiently on its GPUs.

"Using a new input encoding method, researchers can achieve high-quality results using a tiny neural network that runs rapidly. The model was developed using the NVIDIA CUDA Toolkit and the Tiny CUDA Neural Networks library. Since it’s a lightweight neural network, it can be trained and run on a single NVIDIA GPU—running fastest on cards with NVIDIA Tensor cores," NVIDIA says.

According to NVIDIA, its NeRF technique extends well beyond the bounds of recreating old photographs, and can train robots and autonomous vehicles to better understand the size and shape of real-world objects, among other uses.