Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Speaker Normalization and Pronunciation Variant Modeling: Helpful Methods for Improving Recognition of Fast Speech

Thilo Pfau, Robert Faltlhauser, GŁnther Ruske

Institute for Human-Machine-Communication, Technical University of Munich, Germany

The presented paper addresses the problem of creating hidden Markov models for fast speech. The major issues discussed are robust parameter estimation and reducing within-model variations. Regarding the first issue, the use of the maximum a posteriori parameter estimation is discussed. To reduce within-model variations, a maximum likelihood based vocal tract length normalization procedure and a statistical approach to model pronunciation variants are applied. Experiments with a large vocabulary continuous speech recognition system were carried out on the German spontaneous scheduling task (Verbmobil) to prove the effectiveness of the investigated methods. The results show that a combination of pronunciation variant modeling and vocal tract length normalization is most effective. On fast speech, a relative improvement of 16.3% compared to the baseline models was achieved. Pronunciation variant modeling combined with the maximum a posteriori reestimation proved to be the second best method resulting in a 14.9% relative improvement. In addition, this combination does not cause any additional computational load during recognition.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Pfau, Thilo / Faltlhauser, Robert / Ruske, GŁnther (1999): "Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech", In EUROSPEECH'99, 299-302.