For the purpose of determining emotion recognition by acoustic information, we recorded natural dialogs made by two or three players of online games to construct an emotional speech database. Two evaluators categorized recorded utterances in a certain emotion, which were defined with reference to the eight primary emotions of Plutchik's three-dimensional circumplex model. Furthermore, 14 evaluators graded utterances using a 5-point scale of subjective evaluation to obtain reference degrees of emotion. Eleven acoustic features were extracted from utterances and analysis of variance (ANOVA) was conducted to assess significant differences between emotions. Based on the results of ANOVA, we conducted discriminant analysis to discriminate one emotion from the others. Moreover, the experiment estimating emotional degree was conducted with multiple linear regression analysis to estimate emotional degree for each utterance. As a result of discriminant analysis, high correctness values of 79.12% for Surprise and 70.11% for Sadness were obtained, and over 60% correctness were obtained for most of the other emotions. As for emotional degree estimation, values of the adjusted R square (.R2) for each emotion ranged from 0.05 (Disgust) to 0.55 (Surprise) for closed sets, and values of root mean square (RMS) of residual for open sets ranged from 0.39 (Acceptance) to 0.59 (Anger).
Bibliographic reference. Arimoto, Yoshiko / Kawatsu, Hiromi / Ohno, Sumio / Iida, Hitoshi (2008): "Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems", In INTERSPEECH-2008, 322-325.