8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

A Statistical Lexicon for Non-Native Speech Recognition

Rainer Gruhn, Konstantin Markov, Satoshi Nakamura

ATR, Japan

Non-native speech is harder to recognize than native speech, because they pronounce words differently from native speakers. We propose a novel approach to cover non-native pronunciation variations statistically. Rather than explicitly representing those variations, discrete HMMs that model pronunciations of each word are generated. The models are initialized from a baseline lexicon. The phoneme distributions and transition probablilities are estimated on the results of a phoneme recognition on training data. The pronunciation HMMs are evaluated by performing rescoring of n-best continuous word recognition. The task consists of hotel reservation dialogs, spoken by non-native speakers of five accent groups. A pronunciation model is trained and evaluated separately for each group. The word error rate improves in average by 10.9%.

Full Paper

Bibliographic reference.  Gruhn, Rainer / Markov, Konstantin / Nakamura, Satoshi (2004): "A statistical lexicon for non-native speech recognition", In INTERSPEECH-2004, 1497-1500.