INTERSPEECH 2004 - ICSLP
8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Effective Acoustic Modeling for Rate-of-Speech Variation in Large Vocabulary Conversational Speech Recognition

Jing Zheng, Horacio Franco, Andreas Stolcke

SRI International, USA

We investigate several variants of speech-rate-dependent acoustic models for large-vocabulary conversational speech recognition, in the framework of combining rate-specific models in decoding to compensate for speech rate variation. We study two basic approaches to combining rate-specific models: one combines models at the pronunciation level and the other at the HMM state level. Furthermore, we investigate the influence of different numbers of rate-of-speech classes and different parameter tying schemes. Experiments on the Switchboard database, using SRI's DECIPHER recognition system, show that rate-dependent acoustic modeling resulted in a 2% relative word error rate reduction over a rate-independent baseline, and that the pronunciation-level constraint, Gaussian sharing between rate-specific models, and a well-chosen number of rate-of-speech classes are all important for best performance.

Full Paper

Bibliographic reference.  Zheng, Jing / Franco, Horacio / Stolcke, Andreas (2004): "Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition", In INTERSPEECH-2004, 401-404.