![]() |
Speech Recognition and Intrinsic Variation (SRIV2006)Toulouse, France |
![]() |
Inter- and intraspeaker variability is a major source of speech recognition errors in conversational systems. Most sources of variability are not sufficiently represented in the data to train a specific set of models. In order to increase robustness of a speech recognizer we propose a combination of different approaches. All methods have in common that they provide additional acoustic or linguistic context information to the recognizer. The approaches are evaluated on a corpus of spontaneous speech data that has been recorded with a conversational system in a realistic application scenario. Performance of the speech recognizer is measured for the dialogue-states and speaker groups that are marked in this data set. Word error rates for the different speaker groups can be reduced by 11-25% at an overall reduction of 13%. It is concluded that integration of context is a promising direction of research to improve the robustness of a conversational system.
Full Paper
Presentation (.pdf)
Sound Files:
Bad Acoustics
Children
Dialect
Elderly
Female
Low Volume
Male
Nonnative
Station
Train
Bibliographic reference. Stemmer, Georg (2006): "Improved context integration for robust speech recognition in conversational systems", In SRIV-2006, 15-20.