What did you say?

Text-to-speech is now officially creepy.

Apple introduced this week their new speech synthesizer in Leopard called Alex. It’s better than what they had before, and hell of a lot better than Microsoft Sam. This is what it sounds like.

Now you may be thinking that Microsoft is kind of in the dark in terms of speech synthesizers. Microsoft is introducing a new synthesizer called Anna in Windows Vista, which improved drastically over Sam, but not better than Alex.

Where the spotlight shines is on Microsoft’s partners. Oliver from the Microsoft Speech team pointed me to NextUp.com, a site dedicated to Windows TTS compatible speech synthesizer engines. Most of which were considerably better than Microsoft Sam, but one stood out so much above the rest that I just had to write about it. This is Lee, an Australian voice from the RealSpeak Solo made by Nuance, who also makes Dragon Naturally Speaking.

Compare that to Long Zheng TTS, me with my crappy microphone.

Lee almost sounded like a real newsreader besides from a few quirks at the end.

Also, my friend Andy wants some attention.

6 insightful thoughts

  1. I don’t agree with comparing apple’s beta to Microsoft’s excisting technology.

    speech was improved very much in Windows Vista and there are some amazing new voices.

  2. Are you not actually leaving out mention of the Leaders of Speach Synthsis, AT&T ? They have been and are presently years ahead of MS, Apple or any that you have mentioned. I have been using their Natural Voices Audrey for years. Ant it is easily Far ahead of those mentioned above. NAY! They are the founders of the technology. Let us not forget the Bell Labs.

  3. Just a few days ago I decided to buy a voice synth text to speech app. My main goal was to get something that sounded the most real and had the fewest little synthetic artifacts that would clue listeners into figuring out they were listening to a computer voice. I found a company called Ivona and their TTS app called Expressivo that included “Jennifer” their currently only English voice, currently $29 total. It is a step above the rest in realism and made the fewest pronunciation errors.

    AT&T is mentioned above, and while I like some of their voices, there are times they give an odd echo type sound and a slight synthetic gurgle now and then and a few weird pronunciation errors.

    You can give Expressivo a try below and type in your own text and listen to it in their interactive online demo.


Comments are closed.