This paper deals with the problem of building hidden Markov models (HMMs) suitable for fast speech. First an automatic procedure is presented to split speech material into different categories according to the speaking rate. Then the problem of sparse data available for the estimation of HMMs for fast speech is discussed. A comparison of different methods to overcome this problem follows. The main emphasis here is set on robust reestimation techniques like maximum aposteriori estimation (MAP) as well as on methods to reduce the variability of the speech signal and therefore to be able to reduce the number of HMM parameters. Vocaltract length normalization (VTLN) is chosen for that purpose. Finally a comparison of various combinations of the methods discussed is presented basing on word error rates for fast speech. The best method (MAPVTLN) results in a decrease of the error rate of 10% relative to the baseline system.
Cite as: Pfau, T., Ruske, G. (1998) Creating hidden Markov models for fast speech. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0255, doi: 10.21437/ICSLP.1998-231
@inproceedings{pfau98_icslp, author={Thilo Pfau and Guenther Ruske}, title={{Creating hidden Markov models for fast speech}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0255}, doi={10.21437/ICSLP.1998-231} }