This paper investigates acoustic modeling using the hybrid DBN-HMM framework in mispronunciation detection and diagnosis of L2 English. This is one of the first efforts that compare the performance of DBN-HMM with that of the best-tuned GMM-HMM trained in ML and MWE on the same set of features. Previous work in ASR has also shown the necessity of unsupervised pre-training for DBNs to work well. We explore further the effect of training our ASR engine in an unsupervised manner with additional unannotated L2 data from the test speakers. This is compared with the original ASR that has been trained with annotated data in a supervised manner. Experiments show that DBN-HMM can give significant improvement (between 13-18% relative in word pronunciation error rate) but is computationally more expensive.
Index Terms: mispronunciation detection and diagnosis, restricted boltzmann machine, deep belief network
Bibliographic reference. Qian, Xiaojun / Meng, Helen / Soong, Frank K. (2012): "The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training", In INTERSPEECH-2012, 775-778.