Third International Conference on Spoken Language Processing (ICSLP 94)
A conspicuous difference between spontaneous and read speech is that the former has a lively, rhythmical sound, while the latter sounds thin and flat. Music rhythms are created by intensifying certain beats to varying degrees, and grouping the beats together to form global patterns. It was hypothesized that the rhythms of spontaneous speech are created in a similar way. To test this hypothesis, two 1.5 minute monologous sections were chosen from recorded conversations of two male speakers. One is an English speaker with a slow, soothing rhythm while the other is a Japanese speaker with a fast, crisp rhythm. Raw amplitude plots reveal a lower-level structure built on energy clusters and beats. Most characteristics of this structure are shared by the speakers, though some language-specific features are seen. Plots of vowel peak amplitudes reveal a higher-level structure. Both speakers use accented beats to delimit rhythm intervals of approximately 1 second, 2.5 seconds and 4.5 seconds. The accented beats form waves of continuously varying amplitude, which last up to 30 seconds. Further, these waves correspond to different discourse sections and reflect the speaker's emotional state. The exact shape of the waves depends on the nature of the speaker's rhythm.
Bibliographic reference. Umeda, Noriko / Wedmore, Toby (1994): "A rhythm theory for spontaneous speech: the role of vowel amplitude in the rhythmic hierarchy", In ICSLP-1994, 1095-1098.