The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

Toward Naturally Expressive Speech Synthesis: Data–Driven Emotion Detection Using Latent Affective Analysis

Jerome R. Bellegarda

Speech & Language Technologies, Apple Inc., Cupertino, California 95014, USA

A necessary step in the generation of expressive speech synthesis is the automatic detection and classification of emotions most likely to be present in textual input. Though increasingly data-driven, emotion analysis still relies on critical expert knowledge to isolate the emotional keywords or keysets necessary to the construction of affective categories. This makes it vulnerable to any discrepancy between affective states and domain of discourse. This paper proposes a more general strategy, which leverages two separate semantic levels: one encapsulates the foundations of the domain considered, while the other specifically accounts for the overall affective fabric of the language. Exposing the emergent relationship between these two levels advantageously informs the emotion classification process. Empirical evidence suggests that this approach is effective for automatic emotion analysis in text. This bodes well for its deployability toward naturally expressive speech synthesis.

Index Terms: expressive speech synthesis, affective congruence, detection and classification of emotional states, latent semantic analysis.

Full Paper

Bibliographic reference.  Bellegarda, Jerome R. (2010): "Toward naturally expressive speech synthesis: data–driven emotion detection using latent affective analysis", In SSW7-2010, 200-205.