This paper presents a mispronunciation detection system which uses automatic speech recognition to support computer-aided pronunciation training (CAPT). Our methodology extends a model pronunciation lexicon with possible phonetic mispronunciations that may appear in learners' speech. Generation of these pronunciation variants was previously achieved by means of phone-to-phone mapping rules derived from a cross-language phonological comparison between the primary language (L1, Cantonese) and secondary language (L2, American English). This rule-based generation process results in many implausible candidates of mispronunciation. We present a methodology that applies Viterbi decoding on learners' speech using an HMM-based recognizer and the fully extended pronunciation dictionary. Word boundaries are thus identified and all pronunciation variants are scored and ranked based on Viterbi scores. Pruning is applied to keep the N-best pronunciation variants which are deemed plausible candidates for mispronunciation detection. Experiments based on the speech recordings from 21 Cantonese learners of English shows that the agreement between automatic mispronunciation detection and human judges is over 86%.
Bibliographic reference. Wang, Lan / Feng, Xin / Meng, Helen M. (2008): "Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training", In INTERSPEECH-2008, 1729-1732.