We all know that
NVIDIA is doing some cool thing in graphics as it relates to gaming, namely
real-time ray tracing and Deep Learning Super Sampling (DLSS), courtesy of its
Turing GPU architecture and RTX technology. But it's not all fun and games, it's also fun and photo manipulation. By leveraging its work and research in
machine learning, NVIDIA has developed a tool that can take rudimentary doodles or sketches and turn them into "photorealistic masterpieces."
Appropriately enough, NVIDIA is calling this GauGAN, a clever play on words based on the famous impressionist painter
Vincent van Gogh Paul Gauguin, and the technology utilizing
generative adversarial networks, or GANs.
"It’s much easier to brainstorm designs with simple sketches, and this technology is able to
convert sketches into highly realistic images," said Bryan Catanzaro, vice president of applied
deep learning research at NVIDIA.
As you can see in the above video, GauGAN easily and convincingly converts segmentation maps into
lifelike images. Or more accurately, it makes the process
look easy. The underlying technology is actually very powerful and requires significant training to the machine learning model.
The goal is to go from a semantic sketch map to photorealistic shots. To do this, the underlying artificial intelligence needs to be trained on scenes and objects, but not just how they look—it also has to understand how they interact with each other. That part is key for, say, not just placing a tree next to a body of water, but also having its reflection appear in the water, with realistic distortion.
"It’s like a coloring book picture that describes where a tree is, where the sun is, where the sky
is," Catanzaro said. "And then the neural network is able to fill in all of the detail and texture, and
the reflections, shadows and colors, based on what it has learned about real images."
This requires a massive amount of data, and so far NVIDIA has fed its GauGAN deep learning model a million Creative Commons images. To be clear, though, GauGAN does not just stitch together a bunch of preexisting photos and clean up the end the result. What you're seeing are actually unique images.
"It's actually synthesizing new images, very similar to how an artist would draw something," Catanzaro added.
In a sense, GauGAN becomes the artist, constructing photorealistic images based on what the human artist is trying to create. It's nothing short of impressive, and there are numerous uses for something like this, everything from architectural designs and urban planning, to creating virtual worlds and scenes in games.
As for the horsepower required, NVIDIA demonstrated GauGAN rendering scenes in real time on an
RTX Titan. However, many types of processors can run it with a few seconds of render time. For real time rendering, though, RTX technology and Tensor cores are required.