Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

A High-Level Approach to Confidence Estimation in Speech Recognition

Stephen Cox, Srinandan Dasmahapatra

School of Information Systems, University of East Anglia, Norwich, UK

Errors in the output of a speech recogniser can be said to be due to the interaction of inadequate phonetic and language modelling components. We investigate an approach to estimating confidence scores for the words output by a recogniser in which the language modelling and acoustic modelling are decoupled by the use of a phone recogniser working in parallel with the word recogniser. An advantage of such an approach is that it avoids techniques which rely on the use of side-information derived from the decoder: such information may not always be available and/or may depend on the type and configuration of the decoder used. We have investigated two ways of using the additional information provided by the phone-loop recogniser. One is based on correlating the phone strings from the two recognisers; the other is based on using the phone-loop recogniser output to construct hypotheses for the utterance and correlating these hypotheses with the word recogniser output.

