1st Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages

Porto Salvo, Portugal
September 3-4, 2009

Building Accurate and User-Friendly Speech Systems

Alex Acero

Research Area Manager, Microsoft Research, Redmond, WA, USA

While accurate speech recognition engines are critical to successful speech applications, there are other factors than can impact user experience even more than the accuracy of the engine itself. For example, the grammar the ASR engine uses should predict what the user will say but itís often hard for an application developer to design a grammar that will result in high system accuracy. I will show how datadriven techniques can be used to build accurate grammars in a straightforward way. Iíll also describe a technique that uses a statistical language model and an inverted index and which can be used for applications such as voice search or SMS dictation and results in high accurate end-to-end systems.
   Even an accurate speech recognition system is not enough for good user experience because such systems will always make errors and itís critical to provide a graceful error recovery mechanism. Also, users have a choice between speaking, touching a screen, or typing and may choose to not speak unless this is better than the alternative. I will show designs for several systems that take into account this in voice search, education and the automobile.

Bibliographic reference.  Acero, Alex (2009): "Building accurate and user-friendly speech systems", In SLTECH-2009, 3.