INTERSPEECH 2004 - ICSLP
We investigate several variants of speech-rate-dependent acoustic models for large-vocabulary conversational speech recognition, in the framework of combining rate-specific models in decoding to compensate for speech rate variation. We study two basic approaches to combining rate-specific models: one combines models at the pronunciation level and the other at the HMM state level. Furthermore, we investigate the influence of different numbers of rate-of-speech classes and different parameter tying schemes. Experiments on the Switchboard database, using SRI's DECIPHER recognition system, show that rate-dependent acoustic modeling resulted in a 2% relative word error rate reduction over a rate-independent baseline, and that the pronunciation-level constraint, Gaussian sharing between rate-specific models, and a well-chosen number of rate-of-speech classes are all important for best performance.
Bibliographic reference. Zheng, Jing / Franco, Horacio / Stolcke, Andreas (2004): "Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition", In INTERSPEECH-2004, 401-404.