Google DeepMind Empowers AI With Creativity To Perceive And Render Our 3D World
Google has been able to train its various DeepMind AIs to do some very cool things. The AlphaZero AI was able to destroy the highly acclaimed Stockfish chess program over a 100-game match, winning 28 of the matches outright and tying the other 72 matches for a no loss record. DeepMind certainly isn’t the only AI that has performed some cool and impressive feats, however. With respect to a pure gaming venue, EA has developed an AI that is able to successfully battle human players in Battlefield 1. And yet another AI, inspired by the human visual cortex, was recently demonstrated as having the ability to beat CAPTCHA prompts.
Google's DeepMind AI has been at it again, and this time researchers were able to develop an algorithm that can render 3D objects by looking at a 2D picture of the object. The new machine vision algorithm is called the Generative Query Network or GQN. GQN can "imagine" and render scenes from any angle without having a human looking over its shoulder. GQN also required no training to be able to render the scenes.
Researchers say that the algorithm is so good at rendering the 3D scenes that it can render the opposite sides of the 2D images it can't see, from multiple vantage points and can add in shadows. The goal when developing GQN was to replicate the way a human brain learns about its surroundings and interactions between objects. GQN doesn't need an AI researcher to annotate images in datasets before the AI can process them.
Typically, in a visual recognition system, a human must label every object in every scene inside a dataset that the AI processes. With DeepMind, researchers say that the GQN can learn about plausible scenes and geometrical properties without a human having to label contents or scene details just as an infant or an animal learns. A two-part system is used by GQN, with a representation network and a generation network. The representation network takes the input data and translates that into a vector that describes the scene and then the generation network is used to create the scene.
Researchers showed GQN images of scenes from different angles and the algorithm used those images to teach itself about textures, colors, and lighting on objects independently of the other. It also uses that data to learn about spatial relationships. It is then able to predict what the objects would look like from the side or the back. This spatial understanding of GQN is good enough to allow it to control a virtual robot arm to pick up a ball, and it is also able to self-correct.
Researchers say that right now there are limits, however. For starters, it has only been tested on simple scenes with a small number of objects. GQN lacks the sophistication needed to generate complex 3D models. "While there is still much more research to be done before our approach is ready to be deployed in practice, we believe this work is a sizeable step towards fully autonomous scene understanding,” the researchers wrote.