Microsoft’s AI Unit Boosts Speech-to-Text Capabilities And Content Moderation With New APIs

Microsoft is wasting no time in putting it new Artificial Intelligence and Research Group to work. The company just announced that its Microsoft Cognitive Services now includes 25 tools that form the backbone for the Skype Translator and Cortana (among other Microsoft products).

Two of those new APIs are Bing Speech and Content Moderator. Bing Speech is capable of not only translating speech into text, but it can also convert text to speech. Microsoft’s speech recognition technology uses acoustic and language models to tailor its services for a specific language, and to make it easier to distinguish between similar-sounding words.

Microsoft Research

“If the previous words are ‘The player caught the,’” explained Seltzer, “then ‘ball’ is going to be more likely than ‘fall,’” said Andrew Shuman, corporate vice president of products for Microsoft’s AI and Research organization.

A Custom Speech Service lets developers supply their own data in order to narrow the focus of the recognition engine for their own purposes. “The basic idea is that the more focused the systems can be, the better they will perform,” added Seltzer said. “The job of the Custom Speech Service is to let you focus the system on the data that you care about.”

One title that takes this approach is Starship Commander, which was developed by Human Interact. Given that this is a science fiction game, there are a lot of made up words, names and phrases which would have tripped up traditional speech recognition systems. However, the Custom Speech Service allows the developer to make “half as many errors” as open source software used in earlier builds of the game.

“Being able to have software now that observes people, listens, reacts and is knowledgeable about the physical world around them provides an excellent breakthrough in terms of making interfaces more human, more natural, more easy to understand and thus far more impactful in lots of different scenarios,” Seltzer explained.

As for the Content Moderator, it can automatically scan through images, videos and text to filter out offending content (i.e. foul language, nudity, hate speech, etc.). This could help automate the sometimes tough duties of a forum moderator who has to weed through sometimes offensive content or troll posts that might be against posting rules. The text moderation is not only capable of sifting through potentially offensive content in over 100 languages, but it can also intelligently sniff out phishing URLs.

Microsoft hopes to continue to build on its Cognitive Services, providing both robust and customizable tools that they can easily plug into their products. We for one applaud Microsoft for its efforts to democratize artificial intelligence.