IBM Develops Realistic Computerized Voice

It’s easy for us to distinguish a computerized voice from that of a real person.  Even though we can generally figure out what the computer is trying to convey despite poor pronunciations and bad phrasing, understanding doesn’t eliminate the frustration many of us feel when trying to deal with an automated human substitute. Now, thanks to new technology developed by IBM scientists, we may not have to tolerate robotic sounding voices for much longer.

Supposedly, the new computerized voice is nearly indistinguishable from a human’s voice. Part of what makes the new computer voice sound realistic is the inclusion of verbal tics such as “ums,” “ers,” and sighs. This new system is so sophisticated that it will even pause for effect or cough to attract a user’s attention.  It can also “learn” to add expressions at the correct point in a sentence.

While these tics may seem unnecessary, Andy Aaron, of IBM's Thomas J Watson research group speech team, said: "These sounds can be incredibly subtle, even unnoticeable, but have a profound psychological effect. It can be extremely reassuring to have a more attentive-sounding voice.” Even though it may be difficult to distinguish between the computerized voice and a human’s, Aaron noted that fooling someone is not the goal.

The new technology is called generating paralinguistic phenomena via markup in text-to-speech syntheses. A long name, for sure, but the results are what matters.

Potentially, the technology could be used for call centers, GPS systems, or even cameras or iPods. With the new technology, the monotone voices that are part of our daily lives may soon become a part of history, which will certainly be a welcome change to many. While customer service from a live person may be preferable, at least we can hope that the automated services of the future will be a bit more tolerable.