Conventional mispronunciation detection systems that have the capability of providing corrective feedback typically require a set of common error patterns that are known beforehand, obtained either by consulting with experts, or from a human-annotated nonnative corpus. In this paper, we propose a mispronunciation detection framework that does not rely on nonnative training data. We first discover an individual learner's possible pronunciation error patterns by analyzing the acoustic similarities across their utterances. With the discovered error candidates, we iteratively compute forced alignments and decode learner-specific context-dependent error patterns in a greedy manner. We evaluate the framework on a Chinese University of Hong Kong (CUHK) corpus containing both Cantonese and Mandarin speakers reading English. Experimental results show that the proposed framework effectively detects mispronunciations and also has a good ability to prioritize feedback.
Bibliographic reference. Lee, Ann / Glass, James (2015): "Mispronunciation detection without nonnative training data", In INTERSPEECH-2015, 643-647.