Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Creating Hidden Markov Models for Fast Speech by Optimized Clustering

Robert Faltlhauser, Thilo Pfau, GŁnther Ruske

Inst. for Human-Machine-Communication, Munich Univ. of Technology (TUM), Munich, Germany

Previous studies have shown that the recognition accuracy often severely degrades at higher speech rates, which can basically be traced back to two main dimensions: acoustic and phonemic. Reasons for this effect can be found in the phonemic field (e.g. elisions) as well as on the acoustic level: with increasing rates of speech the spectral characteristics are changing. A main obstacle in this context is the training data, consisting of only a small fraction of samples, which can be labeled as 'fast'. Therefore, the effects caused by an increased speech rate often cannot be completely covered. To meet this problem, in this paper an optimized clustering process is presented making eficient use of the available data. Our modified mixture splitting algorithm with an incorporated cross-validation step aims at increasing the generalization of Hidden Markov Models, especially with respect to fast speech. Experimental results showed a relative decrease in word error rate of 7.6% for fast speech.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Faltlhauser, Robert / Pfau, Thilo / Ruske, GŁnther (1999): "Creating hidden Markov models for fast speech by optimized clustering", In EUROSPEECH'99, 407-410.