INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Emotion Recognition in Spontaneous Emotional Speech for Anonymity-Protected Voice Chat Systems

Yoshiko Arimoto (1), Hiromi Kawatsu (2), Sumio Ohno (1), Hitoshi Iida (1)

(1) Tokyo University of Technology, Japan; (2) NIJLA, Japan

For the purpose of determining emotion recognition by acoustic information, we recorded natural dialogs made by two or three players of online games to construct an emotional speech database. Two evaluators categorized recorded utterances in a certain emotion, which were defined with reference to the eight primary emotions of Plutchik's three-dimensional circumplex model. Furthermore, 14 evaluators graded utterances using a 5-point scale of subjective evaluation to obtain reference degrees of emotion. Eleven acoustic features were extracted from utterances and analysis of variance (ANOVA) was conducted to assess significant differences between emotions. Based on the results of ANOVA, we conducted discriminant analysis to discriminate one emotion from the others. Moreover, the experiment estimating emotional degree was conducted with multiple linear regression analysis to estimate emotional degree for each utterance. As a result of discriminant analysis, high correctness values of 79.12% for Surprise and 70.11% for Sadness were obtained, and over 60% correctness were obtained for most of the other emotions. As for emotional degree estimation, values of the adjusted R square (.R2) for each emotion ranged from 0.05 (Disgust) to 0.55 (Surprise) for closed sets, and values of root mean square (RMS) of residual for open sets ranged from 0.39 (Acceptance) to 0.59 (Anger).

Full Paper

Bibliographic reference.  Arimoto, Yoshiko / Kawatsu, Hiromi / Ohno, Sumio / Iida, Hitoshi (2008): "Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems", In INTERSPEECH-2008, 322-325.