In this paper, we propose a statistical method of evaluating the pronunciation proficiency for presentation in English. We statistically analyze the utterances to find a combination that has a high correlation between an English teacher's score and some acoustic features. We found that ratio of the likelihoods of free phoneme recognition by using native English HMMs and non-native English HMMs was the best measure of pronunciation proficiency. The combination of likelihood ratio between native English HMMs and non-native English HMMs (concatenation of correct phone HMMs), likelihood ratio between native English HMMs and non-native English HMMs (phoneme recognition), phoneme recognition rate (correct rate) and word recognition rate (correct rate) are highly related to the English teacher's score. We obtained a correlation coefficient of 0.781 with closed data and 0.743 with open data for speaker at sentence level, respectively. The coefficient was near the correlation between human's scores; 0.691‘0.791.
Bibliographic reference. Nakagawa, Seiichi / Ohta, Kei (2007): "A statistical method of evaluating pronunciation proficiency for presentation in English", In INTERSPEECH-2007, 2317-2320.