The presented paper addresses the problem of creating hidden Markov models for fast speech. The major issues discussed are robust parameter estimation and reducing within-model variations. Regarding the first issue, the use of the maximum a posteriori parameter estimation is discussed. To reduce within-model variations, a maximum likelihood based vocal tract length normalization procedure and a statistical approach to model pronunciation variants are applied. Experiments with a large vocabulary continuous speech recognition system were carried out on the German spontaneous scheduling task (Verbmobil) to prove the effectiveness of the investigated methods. The results show that a combination of pronunciation variant modeling and vocal tract length normalization is most effective. On fast speech, a relative improvement of 16.3% compared to the baseline models was achieved. Pronunciation variant modeling combined with the maximum a posteriori reestimation proved to be the second best method resulting in a 14.9% relative improvement. In addition, this combination does not cause any additional computational load during recognition.
Cite as: Pfau, T., Faltlhauser, R., Ruske, G. (1999) Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 299-302, doi: 10.21437/Eurospeech.1999-78
@inproceedings{pfau99_eurospeech, author={Thilo Pfau and Robert Faltlhauser and Günther Ruske}, title={{Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={299--302}, doi={10.21437/Eurospeech.1999-78} }