Sixth European Conference on Speech Communication and Technology
The presented paper addresses the problem of creating hidden Markov models for fast speech. The major issues discussed are robust parameter estimation and reducing within-model variations. Regarding the first issue, the use of the maximum a posteriori parameter estimation is discussed. To reduce within-model variations, a maximum likelihood based vocal tract length normalization procedure and a statistical approach to model pronunciation variants are applied. Experiments with a large vocabulary continuous speech recognition system were carried out on the German spontaneous scheduling task (Verbmobil) to prove the effectiveness of the investigated methods. The results show that a combination of pronunciation variant modeling and vocal tract length normalization is most effective. On fast speech, a relative improvement of 16.3% compared to the baseline models was achieved. Pronunciation variant modeling combined with the maximum a posteriori reestimation proved to be the second best method resulting in a 14.9% relative improvement. In addition, this combination does not cause any additional computational load during recognition.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Pfau, Thilo / Faltlhauser, Robert / Ruske, GŁnther (1999): "Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech", In EUROSPEECH'99, 299-302.