8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Error Detection in Confusion Network

Alexandre Allauzen

LIMSI, France

In this article, error detection for broadcast news transcription system is addressed in a post-processing stage. We investigate a logistic regression model based on features extracted from confusion networks. This model aims to estimate a confidence score for each confusion set and detect errors. Different kind of knowledge sources are explored such as the confusion set solely, statistical language model, and lexical properties. Impact of the different features are assessed and show the importance of those extracted from the confusion network solely. To enrich our modeling with information about the neighborhood, features of adjacent confusion sets are also added to the vector of features. Finally, a distinct processing of confusion sets is also explored depending on the value of their best posterior probability. To be compared with the standard ASR output, our best system yields to a significant improvement of the classification error rate from 17.2% to 12.3%.

Full Paper

Bibliographic reference.  Allauzen, Alexandre (2007): "Error detection in confusion network", In INTERSPEECH-2007, 1749-1752.