![]() |
Modeling Pronunciation Variation for Automatic Speech RecognitionRolduc, The Netherlands |
![]() ![]() |
When moving from read speech to spontaneous conversational speech, recognition accuracy of todays ASR systems usually decreases about 20-40% even with a huge amount of appropriate training data. We believe that this is to a large degree due to the variability of pronunciations observed in spontaneous speech. In this paper we propose the use of syllable-based Hidden- Markov-Models as a separate explicit pronunciation model. The use of sub-word units allows to even predict pronunciations for words that have not been observed in the training data. Since the output of a phone recognizer is used as input to the lexical model no manually phone-labeled data is needed for training. It can be shown that the HMM-based model does indeed learn the variations observed in the data. Using these pronunciation models allows to create better phone transcriptions of speech data and thus more specialized acoustic models. Utilizing these acoustical models for improved speech recognition though will require a close integration of the pronunciation models into the decoding process.
Bibliographic reference. Heine, Henrik / Evermann, Gunnar / Jost, Uwe (1998): "An HMM-based probabilistic lexicon", In MPV-1998, 57-62.