In this paper we propose a novel emotion recognition method modeling interaction and transition in dialogue. Conventional emotion recognition utilizes intra-features such as MFCCs or F0s within individual utterance. However, human perceive emotions not only through individual utterances but also by contextual information. The proposed method takes in account the contextual effect of utterance in dialogue, which the conventional method fails to. Proposed method introduces Emotion Interaction and Transition (EIT) models which is constructed by end-to-end LSTMs. The inputs of EIT model are the previous emotions of both target and opponent speaker, estimated by state-of-the-art utterance emotion recognition model. The experimental results show that the proposed method improves overall accuracy and average precision by a relative error reduction of 18.8% and 22.6% respectively.
Cite as: Zhang, R., Atsushi, A., Kobashikawa, S., Aono, Y. (2017) Interaction and Transition Model for Speech Emotion Recognition in Dialogue. Proc. Interspeech 2017, 1094-1097, doi: 10.21437/Interspeech.2017-713
@inproceedings{zhang17b_interspeech, author={Ruo Zhang and Ando Atsushi and Satoshi Kobashikawa and Yushi Aono}, title={{Interaction and Transition Model for Speech Emotion Recognition in Dialogue}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1094--1097}, doi={10.21437/Interspeech.2017-713} }