Apple's First AI Research Paper Brings Machine Vision Technologies Into Focus

Machine learning and artificial intelligence are two intertwined and fast-growing fields that are getting attention from some of the world's biggest technology companies, including Microsoft, IBM, Facebook, Google, and now Apple. The latter joins the fray after having published its first AI paper this month (it was submitted in November) that describes a technique on how to greatly improve computer vision and pattern recognition in machines.

This is a pretty big deal, and not just because of the technology involved in the paper. Unlike many other companies involved in AI research and machine learning technologies, Apple has kept its research tight lipped and out of the public eye. Publishing this paper can be seen as an indication that Apple wants a more visible presence in the field of AI, and sharing its work will help the industry at large.


Apple's contribution here deals with how machines see and interpret images. Machine learning largely uses synthetic images and videos to train AI. Using synthetic images is more cost efficient than real-world images because they have already been labeled and annotated. The problem with this approach is that synthetic data is not always realistic enough and can cause machines to learn details that are only found in synthetic images.

One solution is to improve the simulator, but that is an expensive proposition and there is still no guarantee that AI systems will not pick up on rendered details only present in synthetic images. Apple's solution is something it calls Simulated+Unsupervised (S+U) learning where the goal is to improve the realism of synthetic images from a simulator using unlabeled data.

Apple AI Vision

"The improved realism enables the training of better machine learning models on large datasets without any data collection or human annotation effort. In addition to adding realism, S+U learning should preserve annotation information for training of machine learning models—e.g. the gaze direction in Figure 1 should be preserved. Moreover, since machine learning models can be sensitive to artifacts in the synthetic data, S+U learning should generate images without artifacts," Apple explains in its paper.

To accomplish this, Apple researchers modified an existing (and relatively new) machine learning technique called Generative Adversarial Networks that has two neural networks competing against each other. Apple's method involves a simulator generating synthetic images that are put through a refiner. The result is then sent to a discriminator that must figure out which are real and which are synthetic.

If nothing else, Apple's published paper could help the company attract researchers who specialize in certain fields and want to be known for their work.