Pronunciation error detection techniques for children's speech

Daniel Bolanos, Wayne Ward, Barbara Wise, Sarel van Vuuren

In this article we present a novel method for automatic pronunciation error detection of children's speech. A phone graph is generated from the audio segment and augmented if necessary with alignments of phonetic transcriptions of the word to score. This graph is used for extracting phone-level features using conventional HMM/GMM acoustic scores and Support Vector Machine (SVM) classifiers acting as probabilistic estimators. Finally an SVM is used to combine the phone-level features extracted for each word to produce a word-based pronunciation score. Experimental results show that the proposed method and features can be effectively used for pronunciation scoring. In particular the detection of mispronunciations is increased more than 22% with respect to the baseline.

