In today’s affective databases speech turns are often labelled on a continuous scale for emotional dimensions such as valence or arousal to better express the diversity of human affect. However, applications like virtual agents usually map the detected emotional user state to rough classes in order to reduce the multiplicity of emotion dependent system responses. Since these classes often do not optimally reflect emotions that typically occur in a given application, this paper investigates data-driven clustering of emotional space to find class divisions that better match the training data and the area of application. Thereby we consider the Belfast Sensitive Artificial Listener database and TV talkshow data from the VAM corpus. We show that a discriminatively trained Long Short-Term Memory (LSTM) recurrent neural net that explicitly learns clusters in emotional space and additionally models context information outperforms both, Support Vector Machines and a Regression-LSTM net.
Bibliographic reference. Wöllmer, Martin / Eyben, Florian / Schuller, Björn / Douglas-Cowie, Ellen / Cowie, Roddy (2009): "Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks", In INTERSPEECH-2009, 1595-1598.