Speech Recognition and Intrinsic Variation (SRIV2006)

Toulouse, France
May 20, 2006

Improved Context Integration for Robust Speech Recognition in Conversational Systems

Georg Stemmer

Lehrstuhl für Mustererkennung (Informatik 5), Universität Erlangen-Nürnberg, Germany

Inter- and intraspeaker variability is a major source of speech recognition errors in conversational systems. Most sources of variability are not sufficiently represented in the data to train a specific set of models. In order to increase robustness of a speech recognizer we propose a combination of different approaches. All methods have in common that they provide additional acoustic or linguistic context information to the recognizer. The approaches are evaluated on a corpus of spontaneous speech data that has been recorded with a conversational system in a realistic application scenario. Performance of the speech recognizer is measured for the dialogue-states and speaker groups that are marked in this data set. Word error rates for the different speaker groups can be reduced by 11-25% at an overall reduction of 13%. It is concluded that integration of context is a promising direction of research to improve the robustness of a conversational system.

Full Paper
Presentation (.pdf)
Sound Files:
Bad Acoustics
Children
Dialect
Elderly
Female
Low Volume
Male
Nonnative
Station
Train

Bibliographic reference.  Stemmer, Georg (2006): "Improved context integration for robust speech recognition in conversational systems", In SRIV-2006, 15-20.