13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training

Xiaojun Qian (1), Helen Meng (1), Frank K. Soong (1,2)

(1) Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, China
(2) Speech Group, Microsoft Research Asia, Beijing, China

This paper investigates acoustic modeling using the hybrid DBN-HMM framework in mispronunciation detection and diagnosis of L2 English. This is one of the first efforts that compare the performance of DBN-HMM with that of the best-tuned GMM-HMM trained in ML and MWE on the same set of features. Previous work in ASR has also shown the necessity of unsupervised pre-training for DBNs to work well. We explore further the effect of training our ASR engine in an unsupervised manner with additional unannotated L2 data from the test speakers. This is compared with the original ASR that has been trained with annotated data in a supervised manner. Experiments show that DBN-HMM can give significant improvement (between 13-18% relative in word pronunciation error rate) but is computationally more expensive.

Index Terms: mispronunciation detection and diagnosis, restricted boltzmann machine, deep belief network

Full Paper

Bibliographic reference.  Qian, Xiaojun / Meng, Helen / Soong, Frank K. (2012): "The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training", In INTERSPEECH-2012, 775-778.