5th International Conference on Spoken Language Processing
This investigation focuses on deriving local speech rate directly out of the speech signal, which differs from syllable rate and from phone rate. Since local speech rate modifies acoustic cues (e.g. transitions), phones, and even words, it is one of the most important prosodic cues. Our local speech rate estimation method is based on a linear combination of the syllable rate and of the phone rate, since this investigation strongly suggests that neither the syllable rate nor the phone rate on its own represent the speech rate sufficiently. Our results show (a) that perceptual local speech rate correlates better with local syllable rate than with local phone rate (r=0.81>r=0.73), (b) that the linear combination of both is well-correlated with perceptual local speech rate (r=0.88), and (c) that it is now possible to calculate the perceptual local speech rate with the aid of automatic phone boundary detectors and syllable nuclei detectors directly from the speech signal.
Bibliographic reference. Pfitzinger, Hartmut R. (1998): "Local speech rate as a combination of syllable and phone rate", In ICSLP-1998, paper 0523.