First there was Boaty McBoatFace. Now there is Parsey McParseface. On Thursday Google released SyntaxNet, an open-source neural network framework implemented in TensorFlow that provides a foundation for Natural Language Understanding (NLU) systems. This release includes Parsey McParseface, an English language parser that has been trained to analyze English text. Google claims that this is the most accurate model in the world.
The term “parsing” has a fairly interesting history. During the eighteenth and nineteenth centuries, students took parsing courses where they would break down the grammatical structure of sentence in order to understand its exact meaning. Teachers continue to teach parsing through “sentence diagrams” or “parsing trees”. Parsey McParseface does something similar, minus the one-room schoolhouse.
In the computer sciences, a parser is a software component that takes input data and builds a data structure. It gives a structural representation of the input and checks for correct syntax in the process. The SyntaxNet uses sentences as input. It tags each word with a part-of-speech tag that describes the word's syntactic function. It then determines the syntactic relationships between the words in the sentence.
Parsing is incredibly difficult for computers, because language is often ambiguous. Human beings are accustomed to ambiguity, however AI is not. Think for example of the sentence “Elon Musk is cool”. What do I mean by the word “cool” in this sentence. Do I mean that Elon Musk is physically cool or do I mean that Elon Musk is awesome? The meaning of this sentence could be difficult to understand in a person-to-person conversation. AI such as Data from Star Trek: The Next Generation hardly have a chance. (For the record, when I say “Elon Musk is cool”, I mean he is awesome.)
How would Parsey McParseface deal with the sentence, “Elon Musk is cool”? McParseface would first process the sentence from left-to-right. It would be immediately be able to determine that “Musk” is a noun, “is” is a verb, and “cool” is an adjective. In order to figure out what definition of “cool” to apply, it would propose multiple hypothesis and only discard them if one seemed more promising than another. McParseface is trainable in this way. If I frequently wrote sentences such as “Elon Musk is cool. He builds amazing rockets”, McParseface would soon learn that “cool” in this case means “awesome.”
If you have used one of these bad boys, you were parsing.
On its own, Parsey McParseface recovers individual dependencies between words with over 94% accuracy. When humans start to train the device, it can reach 96-97% accuracy. McParseface achieves just over 90% of parse accuracy when dealing with sentences straight from the Internet. McParseface still struggles with prepositional phrase attachments and deep contextual meanings.
You can see the SyntaxNet code here and download the Parsey McParseface parser model for yourself. Let us know in the comments below how your parsing experience goes.