8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Identifying Emotion in Speech Prosody Using Acoustical Cues of Harmony

Takashi Fujisawa, Norman D. Cook

Kansai University, Japan

We have studied the prosody of emotional speech using a psychoacoustical model of musical harmony (designed to explain the basic facts of the perception of pitch combinations: interval consonance/dissonance and chordal harmony/tension). For any voiced utterance, the model provides 3 quasi-musical measures: dissonance, tension, and harmonic modality of the pitches used. Modality is the most interesting, as it relates to the major and minor modes of traditional harmony theory and their characteristic positive and negative affect. In a study of emotional speech using 216 utterances, factor analysis showed that these measures are distinct from those obtained from basic statistics on the fundamental frequency of the voice (mean F0, range, rate of change, etc.). Moreover, there was a significant correlation between the major/minor modality measure and the positive/ negative affect of the utterance. We argue that, in addition to the traditional acoustical measures, a measure of multiple-pitch combinations, i.e., harmony, is essential for determining the affective character of the tone of voice in speech.

Full Paper

Bibliographic reference.  Fujisawa, Takashi / Cook, Norman D. (2004): "Identifying emotion in speech prosody using acoustical cues of harmony", In INTERSPEECH-2004, 1333-1336.