Compared with scoring feedbacks, instructive feedbacks are more demanded by language learners using computer aided pronunciation training (CAPT) systems, which require detailed information about erroneous pronunciations along with phone errors. Pronunciation erroneous tendency (PET) defines a set of incorrect articulation configurations regarding main articulators and uttering manners for the phones respectively, and its robust detection contributes to the provision of appropriate instructive feedbacks. In our previous works, we designed a set of PET labels for CSL (Chinese as a second language) by Japanese learners, and conducted a preliminary detection study with GMM-HMM. This study is aimed at achieving a more robust detection of PETs by two approaches: employing DNN-HMM as the acoustic modeling, and comparing three kinds of acoustic features: MFCC, PLP, and filter-bank. Experimental results showed that the DNN-HMM PET modeling achieved more robust detection accuracies than the previous GMM-HMM, and the three kinds of features behaved differently. A lattice combination of the results of three feature systems led to the best PET results: FRR of 5.5%, FAR of 35.6%, and DA of 88.6%, which showed its efficiency.
Bibliographic reference. Gao, Yingming / Xie, Yanlu / Cao, Wen / Zhang, Jinsong (2015): "A study on robust detection of pronunciation erroneous tendency based on deep neural network", In INTERSPEECH-2015, 693-696.