For better or worse, memes are a part of the online culture, and they are especially prevalent on social media sites. This presented a problem for Facebook, both in terms of tracking down content that runs afoul of its terms of service, and in deciding which content its users are likely to be interested in seeing. So, the company went and developed a large-scale machine learning system named Rosetta to better understand memes.
As you might already known, Facebook relies heavily on algorithms to sift and sort through content. There are simply too many users and too many posts to lay content moderation solely on the shoulders of human workers. Facebook employs flesh and blood moderators too, of course, but its fancy algorithms play a huge role in the experience.
What those algorithms can't do, however, is read embedded text on memes. That is where Rosetta comes into play.
"[Rosetta] extracts text from more than a billion public Facebook and Instagram images and video frames (in a wide variety of languages), daily and in real time, and inputs it into a text recognition model that has been trained on classifiers to understand the context of the text and the image together," Facebook explains.
There are two main steps to the process. First, the machine learning systems detects retangular regions that could potentially contain text. After that, it analyzes each of the detected regions using a conventional neural network (CNN) to recognize and transcribe words contained within them.
The technology is rooted in optical character recognition (OCR), only here it's been purpose built to process images that get uploaded to Facebook on a daily basis. There's also a training element at play. Facebook's approach to training data is a mixture of human-annotated public images with words and their locations, as well as synthetic generation of text on public languages.
"Rosetta has been widely adopted by various products and teams within Facebook and Instagram. Text extracted from images is being used as a feature in various upstream machine learning models such as those to improve the relevance and quality of photo search, automatically identify content that violates our hate-speech policy on the platform in various languages, and improve the accuracy of classification of photos in News Feed to surface more personalized content," Facebook says.
This is just the beginning of what Facebook hopes to accomplish. It's still developing Rosetta, and Facebook acknowledges that it needs to support many more languages. In addition, text doesn't always come in neat, horizontal rows—it's sometimes warped, rotated, obfuscated, or otherwise distorted. These are all things Facebook is working to improve.