Previous studies have shown that the recognition accuracy often severely degrades at higher speech rates, which can basically be traced back to two main dimensions: acoustic and phonemic. Reasons for this effect can be found in the phonemic field (e.g. elisions) as well as on the acoustic level: with increasing rates of speech the spectral characteristics are changing. A main obstacle in this context is the training data, consisting of only a small fraction of samples, which can be labeled as 'fast'. Therefore, the effects caused by an increased speech rate often cannot be completely covered. To meet this problem, in this paper an optimized clustering process is presented making eficient use of the available data. Our modified mixture splitting algorithm with an incorporated cross-validation step aims at increasing the generalization of Hidden Markov Models, especially with respect to fast speech. Experimental results showed a relative decrease in word error rate of 7.6% for fast speech.
Cite as: Faltlhauser, R., Pfau, T., Ruske, G. (1999) Creating hidden Markov models for fast speech by optimized clustering. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 407-410, doi: 10.21437/Eurospeech.1999-105
@inproceedings{faltlhauser99_eurospeech, author={Robert Faltlhauser and Thilo Pfau and Günther Ruske}, title={{Creating hidden Markov models for fast speech by optimized clustering}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={407--410}, doi={10.21437/Eurospeech.1999-105} }