We address the automatic detection of phone-level mispronunciation for feedback in a computer-aided language learning task where the target language data (Indian English) is limited. Based on the recent success of DNN acoustic models on limited resource recognition tasks, we compare different methods of utilizing the limited target language data in the training of acoustic models that are initialized with multilingual data. Frame-level DNN posteriors obtained by the different training methods are compared in a phone classification task with a baseline GMM/HMM system. A judicious use of domain knowledge in terms of L2 phonology and L1 interference, that includes influence on phone quality and duration, are applied to the design of confidence scores for mispronunciation detection of vowels of Indian English as spoken by Gujarati L1 learners. We also show that the pronunciation error detection system benefits from a more precise signal-based segmentation of the test speech vowels, as would be expected due to the now more reliable frame-based confidence scores.
Bibliographic reference. Joshi, Shrikant / Deo, Nachiket / Rao, Preeti (2015): "Vowel mispronunciation detection using DNN acoustic models with cross-lingual training", In INTERSPEECH-2015, 697-701.