Third European Conference on Speech Communication and Technology

Berlin, Germany
September 22-25, 1993


Recognition Confidence Measures for Spontaneous Spoken Dialog

Sheryl R. Young, Wayne Ward

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

This paper reports on a new technique for evaluating confidence in word strings produced by a speech recognition system used for processing limited domain spontaneous dialogs. The technique can also be used to produce confidence metrics for non-spontaneous speech. Spontaneous speech is especially difficult because unknown words, verbal noise and speech repairs and edits arc common phenomena that complicate the basic speaker-independent continuous speech recognition process. Our goal is to produce a confidence metric for spontaneously generated word strings that combines acoustic and higher-level knowledge sources through the use of Bayesian Updating. This confidence measure takes into account knowledge source reliability and ability to differentially discriminate misrecognitions. This wor£ is part of our larger project on automatically detecting and acquiring the meaning or out-of-vocabulary words. In estimating acoustic confidence, we first normalize the word score produced by the recognizer. This is done by subtracting the log-probability score for an all-phone recognition from the log-probability word score and normalizing for length. The all-phone score is generated by running the speecn recognizer on the utterance allowing any triphone to follow any other enphone with a trigram probability for tnphone sequences. A triphone is a context dependent phone model. Trigrams of the triphone sequences are computed from a large corpus of English language text. We use Bayesian Updating to turn the normalized word score into a confidence measure. For this, words are grouped into classes using alternate grouping methods. For each word class we estimate when a word in the class is seen with a particular score, what is the percentage of time that the word was correctly recognized. This estimate is made by running the recognition system on a training set of data. This gives us a airect measure of the confidence with which we can reject or accept a word based on acoustic measures.

Full Paper

Bibliographic reference.  Young, Sheryl R. / Ward, Wayne (1993): "Recognition confidence measures for spontaneous spoken dialog", In EUROSPEECH'93, 1177-1179.