Google Patents Real-World Image Recognition Algorithm For Video, Photography

by Joel Hruska — Saturday, September 01, 2012, 08:26 PM EDT

Google has been granted a patent on a system for real-world object identification scanning that will allow the company to survey and tag images and videos. The abstract for patent 8,254,699 (hereafter referred to as the '699 patent) covers

An object recognition system performs a number of rounds of dimensionality reduction and consistency learning on visual content items such as videos and still images, resulting in a set of feature vectors that accurately predict the presence of a visual object represented by a given object name within an visual content item. The feature vectors are stored in association with the object name which they represent and with an indication of the number of rounds of dimensionality reduction and consistency learning that produced them. The feature vectors and the indication can be used for various purposes, such as quickly determining a visual content item containing a visual representation of a given object name.

Translation: Famous objects, landmarks, and commonly photographed views all have common characteristics that can be recognized by automatic programs rather than needing puny humans to perform the same task. Or, as Google notes:

Currently, automated recognition within a digital video of images of real-world objects of interest to a user, such as people, animals, automobiles, consumer products, buildings, and the like, is a difficult problem. Conventional systems, to the extent that they allow for such recognition at all, typically use supervised learning which requires training sets of images that have been manually labeled as representing particular objects... However, such human input is expensive, time-consuming, and cannot scale up to handle very large data sets comprising hundreds of thousands of objects and millions of images. This is particularly a problem in the context of video hosting systems, such as Google Video or YouTube, in which users submit millions of videos, each containing numerous distinct visual objects over the length of the video. The use of unsupervised learning techniques, in which the explicit input of human operators is not required to learn to recognize objects, has not yet been achieved for large-scale image recognition systems."

The privacy implications of such an automated system or enormous. Facebook's own automatic facial recognition software was highly controversial when it debuted, and what Google has now patented puts Facebook to shame. The larger question, unaddressed in this patent, is whether we want our individual personal data to be tagged, filed, and logged without permission or choice. To a potential advertiser, this sort of information is gold.

The implications reach far beyond what Google states here. This type of data could be used for logo identification (for better advertising) and as a form of citizen tracking. If that last seems like a reach, remember that Google's own vision for the future of computing is a Chromebook/Chromebox that's completely dependent on their own services for everything. Content is tied to Google servers at every step, which would give them an extremely convenient way to gain access to such details.

Users have the right to decide whether or not they want to use cloud services or turn data over to Google, but we're deeply uneasy about the idea of data mining user videos for this sort of content. It could be an important step forward towards the creation of semantic search databases -- but questions of user privacy and data ownership must be answered first.

Tags: Google, YouTube, Cloud, patents, USPTO, image recognition, video search, tagging

Joel Hruska

Opinions and content posted by HotHardware contributors are their own.