5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Speaker Normalization and Speaker Adaptation - A Combination for Conversational Speech Recognition

Puming Zhan, Martin Westphal, Michael Finke, Alex Waibel

Interactive Systems Laboratories, Carnegie Mellon University University of Karlsruhe, Germany

Speaker normalization and speaker adaptation are two strategies to tackle the variations from speaker, channel, and environment. The vocal tract length normalization (VTLN) is an effective speaker normalization approach to compensate for the variations of vocal tract shapes. The Maximum Likelihood Linear Regression(MLLR) is a recent proposed method for speaker-adaptation. In this paper, we propose a speaker-specific Bark scale VTLN method, investigate the combination of the VTLN with MLLR, and present an iterative procedure for decoding the combined system of VTLN and MLLR. The results show that: (1) the new VTLN method is very effective with which the word error rate can be reduced up to 11%; (2) the combination of VTLN and MLLR can provide up to 15% word error reduction; (3) both VTLN and MLLR are more effective for the push-to-talk data than for the cross-talk data.

Full Paper

Bibliographic reference.  Zhan, Puming / Westphal, Martin / Finke, Michael / Waibel, Alex (1997): "Speaker normalization and speaker adaptation - a combination for conversational speech recognition", In EUROSPEECH-1997, 2087-2090.