This study provides a method that increases the robustness of
automated speech scoring. Responses with sub-optimal characteristics
such as background noises, volume problems, non-English speech,
whispered speech, and non-responses make automated scoring more difficult.
For instance, loud background
noises distort the spectral characteristics of speech, and the
performance of the prosody and pronunciation features are
significantly degraded. Finally, the automated scores of these
responses become less reliable.
In order to address this problem, the automated scoring system in this study first filters out non-scorable responses using a filtering model and then predicts the proficiency scores of the remaining responses using a scoring model. In addition to automatic speech recognition-based (ASR) filter, which demonstrated promising performances in previous studies, a new filter was implemented in this study using acoustic features. The acoustic-based filter achieved a comparable performance to the ASR-based filter, and the combination of the two models achieved further improvement. The combined filter was evaluated on two actual test products and it achieved an accuracy rate of over 98% with an F-score of 86%.
Index Terms: automated speech scoring, speech recognition, acoustic features, filtering models, scorable responses
Bibliographic reference. Jeon, Je Hun / Yoon, Su-Youn (2012): "Acoustic feature-based non-scorable response detection for an automated speaking proficiency assessment", In INTERSPEECH-2012, 1275-1278.