ISCA Archive ASR 2000
ISCA Archive ASR 2000

Prosodically motivated features for confidence measures

Silke Goronzy, Krzysztof Marasek, Andreas Haag, Ralf Kompe

In this paper new, phone-duration-based features for confidence measures (CMs) using a classifier are proposed. In misrecognized utterances, the segmentation and thus the phoneme durations often deviate severely from what can be observed in the training data. Also the found segmentation for one recognized phoneme often covers several ’real’ phonemes, that have different spectral properties. So such phoneme durations often indicate that a misrecognition took place and we derived some new features based on these durations. In addition to these new features we used some related to the acoustic score of the N-best hypotheses. Using the full set of 46 features we achieve a correct classification rate of 90% at a false rejection rate of 5.1% on an isolated word, command&control task using a rather simple neural network (NN) classifier. Simultaneously, we try to detect out of vocabulary (OOV) words with the same approach and succeed in 91% of the cases. We then combine this CM with unsupervised MAP and MLLR speaker adaptation. The adaptation is guided by the CM and the acoustic models are only modified if the utterance was recognized with high confidence.

Cite as: Goronzy, S., Marasek, K., Haag, A., Kompe, R. (2000) Prosodically motivated features for confidence measures. Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium, 207-212

  author={Silke Goronzy and Krzysztof Marasek and Andreas Haag and Ralf Kompe},
  title={{Prosodically motivated features for confidence measures}},
  booktitle={Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium},