Predicting possible speech recognition errors can be invaluable for a number of Automatic Speech Recognition (ASR) applications. In this study, we extend a Weighted Finite State Transducer (WFST) framework for error prediction to facilitate a comparison between two approaches of predicting confusable words: examining recognition errors on the training set to learn phone confusions and utilizing distances between the phonetic acoustic models for the prediction task. We also expand the framework to deal with continuous word recognition and we can accurately predict 60% of the misrecognized sentences (with an average words-per-sentence count of 15) and a little over 70% of the total number of errors from the unseen test data where no acoustic information related to the test data is utilized.
Bibliographic reference. Jyothi, Preethi / Fosler-Lussier, Eric (2009): "A comparison of audio-free speech recognition error prediction methods", In INTERSPEECH-2009, 1211-1214.