We have developed an automated method that predicts the word accuracy of a speech recognition system for non-native speech, in the context of speaking proficiency scoring. A model was trained using features based on speech recognizer scores, function word distributions, prosody, background noise, and speaking fluency. Since the method was implemented for non-native speech, fluency features, which have been used for non-native speakers' proficiency scoring, were implemented along with several feature groups used from past research. The fluency features showed promising performance by themselves, and improved the overall performance in tandem with other more traditional features. A model using stepwise regression achieved a correlation with word accuracy rates of 0.76, compared to a baseline of 0.63 using only confidence scores. A binary classifier for placing utterances in high-or low-word accuracy bins achieved an accuracy of 84%, compared to a majority class baseline of 64%.
Bibliographic reference. Yoon, Su-Youn / Chen, Lei / Zechner, Klaus (2010): "Predicting word accuracy for the automatic speech recognition of non-native speech", In INTERSPEECH-2010, 773-776.