Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Improvements on Speech Recognition for Fast Talkers

M. Richardson, M. Hwang, Alex Acero, Xuedong Huang

Speech Technology Group, Microsoft Research Redmond, WA, USA

The accuracy of a speech recognition (SR) system depends on many factors, such as the presence of background noise, mismatches in microphone and language models, variations in speaker, accent and even speaking rates. In addition to fast speakers, even normal speakers will tend to speak faster when using a speech recognition system in order to get higher throughput. Unfortunately, state-of-the-art SR systems perform significantly worse on fast speech. In this paper, we present our efforts in making our system more robust to fast speech. We propose cepstrum length normalization, applied to the incoming testing utterances, which results in a 13% word error rate reduction on an independent evaluation corpus. Moreover, this improvement is additive to the contribution of Maximum Likelihood Linear Regression (MLLR) adaptation. Together with MLLR, a 23% error rate reduction was achieved.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Richardson, M. / Hwang, M. / Acero, Alex / Huang, Xuedong (1999): "Improvements on speech recognition for fast talkers", In EUROSPEECH'99, 411-414.