Sixth European Conference on Speech Communication and Technology
It is well known  that variations in speaking rate can account for a significant percentage of errors in practical speech recognition tasks. This is the result of the dynamic nature of speech which is not modelled properly by the standard HMM structure. This paper proposes an extension to the standard HMM that takes advantage of the information about the rate of speech that is contained in inter-frame transitions. The new model can be seen as a combination of Moore and Mealy type HMM's that has output probabilities attached to the transitions between states in addition to the conventional output probabilities attached to states. In this model fast and slow transitions are associated with additional hidden parameters. The output probabilities of the transitions are modelled with gamma distributions.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Tuerk, Andreas / Young, Steve (1999): "Modelling speaking rate using a between frame distance metric", In EUROSPEECH'99, 419-422.