ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium

September 18-20, 2000
Paris, France

New Adaptation Techniques for Large Vocabulary Continuous Speech Recognition

Yuqing Gao, Bhuvana Ramabhadran, and Michael Picheny

Human Language Technologies, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

This paper proposes several new speaker adaptation techniques to improve the large vocabulary continuous speech recognition accuracy. These include, discriminative adaptation, state-quality measure based adaptation, and N-best hypothesis based adaptation schemes. We propose to incorporate the MMIE criterion in the computation of the posterior counts from the adaptation data. We present a new measure, the state quality measure, to evaluate the quality of a HMM state and subsequently use it for selecting good segments of speech during unsupervised adaptation and as a confi- dence measure during decoding/rescoring. The state quality measure is the confidence associated with the acoustic model’s ability to predict the HMM state correctly. It is estimated from the correct and decoded set of transcriptions and is used in conjunction with N-best hypotheses for weighting the state occupancy counts during adaptation. In conjunction with the adaptation schemes, we also present the Viterbi algorithm to estimate the HMM state occupancy counts instead of the Forward-Backward algorithm in order to obtain speed ups without degradation in accuracy. Our results on an in-house spontaneous speech task show improvements in the range of 4% to 14% relative for each of the presented techniques.


Full Paper (PDF)   Full Paper (Zipped Postscript)

Bibliographic reference.  Gao, Yuqing / Ramabhadran, Bhuvana / Picheny, Michael (2000): "New adaptation techniques for large vocabulary continuous speech recognition", In ASR-2000, 107-111.