The paper deals with the problem of predicting speech recognition quality and filtering poorly recognized utterances in the case when no reference transcripts are available. In the proposed system, word error rate (WER) predictions for individual utterances are made using conditional random fields (CRF), and classification based on a given threshold is performed afterwards. We propose using a boosting technique, which significantly increases recall for high precision values. We also apply Recurrent Neural Networks (RNN) directly to the utterance classification task and obtain comparable results but with a much simpler system. All experiments were carried out on Russian spontaneous conversational speech.
Bibliographic reference. Korenevsky, Maxim L. / Smirnov, Andrey B. / Mendelev, Valentin S. (2015): "Prediction of speech recognition accuracy for utterance classification", In INTERSPEECH-2015, 1275-1279.