ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech

Thilo Pfau, Robert Faltlhauser, Günther Ruske

The presented paper addresses the problem of creating hidden Markov models for fast speech. The major issues discussed are robust parameter estimation and reducing within-model variations. Regarding the first issue, the use of the maximum a posteriori parameter estimation is discussed. To reduce within-model variations, a maximum likelihood based vocal tract length normalization procedure and a statistical approach to model pronunciation variants are applied. Experiments with a large vocabulary continuous speech recognition system were carried out on the German spontaneous scheduling task (Verbmobil) to prove the effectiveness of the investigated methods. The results show that a combination of pronunciation variant modeling and vocal tract length normalization is most effective. On fast speech, a relative improvement of 16.3% compared to the baseline models was achieved. Pronunciation variant modeling combined with the maximum a posteriori reestimation proved to be the second best method resulting in a 14.9% relative improvement. In addition, this combination does not cause any additional computational load during recognition.


doi: 10.21437/Eurospeech.1999-78

Cite as: Pfau, T., Faltlhauser, R., Ruske, G. (1999) Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 299-302, doi: 10.21437/Eurospeech.1999-78

@inproceedings{pfau99_eurospeech,
  author={Thilo Pfau and Robert Faltlhauser and Günther Ruske},
  title={{Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={299--302},
  doi={10.21437/Eurospeech.1999-78}
}