In this article, error detection for broadcast news transcription system is addressed in a post-processing stage. We investigate a logistic regression model based on features extracted from confusion networks. This model aims to estimate a confidence score for each confusion set and detect errors. Different kind of knowledge sources are explored such as the confusion set solely, statistical language model, and lexical properties. Impact of the different features are assessed and show the importance of those extracted from the confusion network solely. To enrich our modeling with information about the neighborhood, features of adjacent confusion sets are also added to the vector of features. Finally, a distinct processing of confusion sets is also explored depending on the value of their best posterior probability. To be compared with the standard ASR output, our best system yields to a significant improvement of the classification error rate from 17.2% to 12.3%.
Bibliographic reference. Allauzen, Alexandre (2007): "Error detection in confusion network", In INTERSPEECH-2007, 1749-1752.