INTERSPEECH 2004 - ICSLP
Non-native speech is harder to recognize than native speech, because they pronounce words differently from native speakers. We propose a novel approach to cover non-native pronunciation variations statistically. Rather than explicitly representing those variations, discrete HMMs that model pronunciations of each word are generated. The models are initialized from a baseline lexicon. The phoneme distributions and transition probablilities are estimated on the results of a phoneme recognition on training data. The pronunciation HMMs are evaluated by performing rescoring of n-best continuous word recognition. The task consists of hotel reservation dialogs, spoken by non-native speakers of five accent groups. A pronunciation model is trained and evaluated separately for each group. The word error rate improves in average by 10.9%.
Bibliographic reference. Gruhn, Rainer / Markov, Konstantin / Nakamura, Satoshi (2004): "A statistical lexicon for non-native speech recognition", In INTERSPEECH-2004, 1497-1500.