September 22-25, 1997
In this paper, we describe a rate of speech estimator that is derived directly from the acoustic signal. This measure has been developed as an alternative to lexical measures of speaking rate such as phones or syllables per second, which, in previous work, we estimated using a first recognition pass; the accuracy of our earlier lexical rate estimate depended on the quality of recognition. Here we show that our new measure is a good predictor of word error rate, and in addition, correlates moderately well with lexical speech rate. We also show that a simple modification of the model transition probabilities based on this measure can reduce the error rate almost as much as using lexical phones per second calculated from manually transcribed data. When we categorized test utterances based on speaking rate thresholds computed from the training set, we observed that a different transition probability value was required to minimize the error rate in each speaking rate bin. However, the reduction of error provided by this approach is still small in comparison with the increases in error observed for unusually fast or slow speech.
Bibliographic reference. Morgan, Nelson / Fosler, Eric / Mirghafori, Nikki (1997): "Speech recognition using on-line estimation of speaking rate", In EUROSPEECH-1997, 2079-2082.