NVIDIA's GauGAN AI Machine Learning Tool Photorealistic Images From Simple Hand Doodling

NVIDIA GauGAN
We all know that NVIDIA is doing some cool thing in graphics as it relates to gaming, namely real-time ray tracing and Deep Learning Super Sampling (DLSS), courtesy of its Turing GPU architecture and RTX technology. But it's not all fun and games, it's also fun and photo manipulation. By leveraging its work and research in machine learning, NVIDIA has developed a tool that can take rudimentary doodles or sketches and turn them into "photorealistic masterpieces."

Appropriately enough, NVIDIA is calling this GauGAN, a clever play on words based on the famous impressionist painter Vincent van Gogh Paul Gauguin, and the technology utilizing generative adversarial networks, or GANs.

"It’s much easier to brainstorm designs with simple sketches, and this technology is able to convert sketches into highly realistic images," said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA.



As you can see in the above video, GauGAN easily and convincingly converts segmentation maps into lifelike images. Or more accurately, it makes the process look easy. The underlying technology is actually very powerful and requires significant training to the machine learning model.

The goal is to go from a semantic sketch map to photorealistic shots. To do this, the underlying artificial intelligence needs to be trained on scenes and objects, but not just how they look—it also has to understand how they interact with each other. That part is key for, say, not just placing a tree next to a body of water, but also having its reflection appear in the water, with realistic distortion.

GauGAN Waterfall

"It’s like a coloring book picture that describes where a tree is, where the sun is, where the sky is," Catanzaro said. "And then the neural network is able to fill in all of the detail and texture, and the reflections, shadows and colors, based on what it has learned about real images."

This requires a massive amount of data, and so far NVIDIA has fed its GauGAN deep learning model a million Creative Commons images. To be clear, though, GauGAN does not just stitch together a bunch of preexisting photos and clean up the end the result. What you're seeing are actually unique images.

"It's actually synthesizing new images, very similar to how an artist would draw something," Catanzaro added.

In a sense, GauGAN becomes the artist, constructing photorealistic images based on what the human artist is trying to create. It's nothing short of impressive, and there are numerous uses for something like this, everything from architectural designs and urban planning, to creating virtual worlds and scenes in games.

As for the horsepower required, NVIDIA demonstrated GauGAN rendering scenes in real time on an RTX Titan. However, many types of processors can run it with a few seconds of render time. For real time rendering, though, RTX technology and Tensor cores are required.