Interspeech'2005 - Eurospeech
Whereas experimental studies on emotional speech often control for neutral semantics, speech in naturalistic speech corpora is characterized by contextual cues and non-neutral semantic content. Moreover, the target emotion of an utterance is generally unknown and must be inferred by the listener. Within the context of having child-directed expressive text-to-speech synthesis as goal, we describe a perceptual study based on an expressive spoken corpus of children's stories with unknown emotional targets, and report on interannotator agreement in a forced-choice discrimination task. Moreover, a threshold of high agreement was used to establish subsets of confident exemplar utterances for emotional classes, comprising 35% of the initial corpus. The exemplars were clustered based on the differences from the default mean neutral for 11 global acoustic features, yielding clusters cutting across emotion boundaries, some of which reflected arousal levels, with the neutral exemplars showing particularly complex distributions. Moreover, the mean features for four emotional exemplar categories were contrasted against the default, finding both expected and contradictory tendencies, compared to previous reports. The results indicate that semantic and prosodic cues collaborate to express and reinforce emotional contents, while emotional sequencing seems likely to be another factor which contributes to emotional perception in this domain.
Bibliographic reference. Alm, Cecilia Ovesdotter / Sproat, Richard (2005): "Perceptions of emotions in expressive storytelling", In INTERSPEECH-2005, 533-536.