16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Stacked Auto-Encoder for ASR Error Detection and Word Error Rate Prediction

Shahab Jalalvand, Daniele Falavigna

FBK, Italy

Recently, Stacked Auto-Encoders (SAE) have been successfully used for learning imbalanced datasets. In this paper, for the first time, we propose to use a Neural Network classifier furnished by an SAE structure for detecting the errors made by a strong Automatic Speech Recognition (ASR) system. Error detection on an automatic transcription provided by a “strong” ASR system, i.e. exhibiting a small word error rate, is difficult due to the limited number of “positive” examples (i.e. words erroneously recognized) available for training a binary classifier. In this paper we investigate and compare different types of classifiers for automatically detecting ASR errors, including the one based on a stacked auto-encoder architecture. We show the effectiveness of the latter by measuring and comparing performance on the automatic transcriptions of an English corpus collected from TED talks. Performance of each investigated classifier is evaluated both via receiving operating curve and via a measure, called mean absolute error, related to the quality in predicting the corresponding word error rate. The results demonstrates that the classifier based on SAE detects the ASR errors better than the other classification methods.

Full Paper

Bibliographic reference.  Jalalvand, Shahab / Falavigna, Daniele (2015): "Stacked auto-encoder for ASR error detection and word error rate prediction", In INTERSPEECH-2015, 2142-2146.