We present a new stochastic model for the energy and duration of phone segments ivhich takes account of the speech rate, the loudness of the signal and the effects of stress and pre-pausal lengthening and we show how the block Viterbi decoding algorithm can be used to integrate it with phone-based HMM speech recognizers. The model has been implemented on an isolated-word data-base and a preliminary experiment gives a modest improvement in word recognition accuracy.
Cite as: Kenny, P., Parthasarathy, S., Gupta, V.N., Lennig, M., Mermelstein, P., O'Shaughnessy, D. (1991) Energy, duration and Markov models. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 655-658, doi: 10.21437/Eurospeech.1991-161
@inproceedings{kenny91_eurospeech, author={P. Kenny and S. Parthasarathy and V. N. Gupta and Matthew Lennig and Paul Mermelstein and Douglas O'Shaughnessy}, title={{Energy, duration and Markov models}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={655--658}, doi={10.21437/Eurospeech.1991-161} }