A Computer-Assisted Pronunciation Training (CAPT) system can provide greater benefit to language learners if it provides not only scoring but also corrective feedback. However, the process of deriving pronunciation error patterns usually requires linguistic knowledge, or large quantities of expensive, annotated, corpora from nonnative speakers. In this paper we explore the possibility of deriving context-dependent error patterns with limited human annotations. A two-stage labeling mechanism is proposed, which first selects a set of templates for human annotation, and then propagates the labels. To deal with the imbalanced number of correct and incorrect phone-level pronunciations in nonnative speech, pronunciation patterns on an individual learner-level are first summarized, and then corpus-level clustering is done for template selection. The concept of contextual similarity based on a phonemic broad class definition is also proposed for label propagation. For evaluation, we view the task as an information retrieval task, and take advantage of metrics that consider both the importance and the ranking of an error type. Experimental results on a Chinese University of Hong Kong (CUHK) nonnative corpus show that the proposed framework can effectively discover prominent error patterns.
Bibliographic reference. Lee, Ann / Glass, James R. (2014): "Context-dependent pronunciation error pattern discovery with limited annotations", In INTERSPEECH-2014, 2877-2881.