Modeling Pronunciation Variation for Automatic Speech Recognition

Rolduc, The Netherlands
May 4-6, 1998

An HMM-Based Probabilistic Lexicon

Henrik Heine, Gunnar Evermann, Uwe Jost

University of Hamburg, Germany

When moving from read speech to spontaneous conversational speech, recognition accuracy of todays ASR systems usually decreases about 20-40% even with a huge amount of appropriate training data. We believe that this is to a large degree due to the variability of pronunciations observed in spontaneous speech. In this paper we propose the use of syllable-based Hidden- Markov-Models as a separate explicit pronunciation model. The use of sub-word units allows to even predict pronunciations for words that have not been observed in the training data. Since the output of a phone recognizer is used as input to the lexical model no manually phone-labeled data is needed for training. It can be shown that the HMM-based model does indeed learn the variations observed in the data. Using these pronunciation models allows to create better phone transcriptions of speech data and thus more specialized acoustic models. Utilizing these acoustical models for improved speech recognition though will require a close integration of the pronunciation models into the decoding process.

Full Paper

Bibliographic reference.  Heine, Henrik / Evermann, Gunnar / Jost, Uwe (1998): "An HMM-based probabilistic lexicon", In MPV-1998, 57-62.