The task of word-level confidence estimation (CE) for automatic speech recognition (ASR) systems stands to benefit from the combination of suitably defined input features from multiple information sources. However, the information sources of interest may not necessarily operate at the same level of granularity as the underlying ASR system. The research described here builds on previous work on confidence estimation for ASR systems using features extracted from word-level recognition lattices, by incorporating information at the sub-word level. Furthermore, the use of Conditional Random Fields (CRFs) with hidden states is investigated as a technique to combine information for word-level CE. Performance improvements are shown using the sub-word-level information in linear-chain CRFs with appropriately engineered feature functions, as well as when applying the hidden-state CRF model at the word level.
Index Terms: confidence estimation, hidden-state conditional random fields, speech recognition, sub-word-level information
Bibliographic reference. Seigel, Matthew S. / Woodland, Phillip C. (2012): "Using sub-word-level information for confidence estimation with conditional random field models", In INTERSPEECH-2012, 2338-2341.