Obtaining accurate confidence measures for automatic speech recognition (ASR) transcriptions is an important task which stands to benefit from the use of multiple information sources. This paper investigates the application of conditional random field (CRF) models as a principled technique for combining multiple features from such sources. A novel method for combining suitably defined features is presented, allowing for confidence annotation using lattice-based features of hypotheses other than the lattice 1-best. The resulting framework is applied to different stages of a state-of-the-art large vocabulary speech recognition pipeline, and consistent improvements are shown over a sophisticated baseline system.
Bibliographic reference. Seigel, M. S. / Woodland, P. C. (2011): "Combining information sources for confidence estimation with CRF models", In INTERSPEECH-2011, 905-908.