Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Recognition of Emotion in a Realistic Dialogue Scenario

Richard Huber, Anton Batliner, Jan Buckow, Elmar Nöth, Volker Warnke, Heinrich Niemann

Chair for Pattern Recognition, University of Erlangen-Nuremberg, Germany

Nowadays modern automatic dialogue systems are able to understand complex sentences instead of only a few commands like Stop or No. In a call-center, such a system should be able to determine in a critical phase of the dialogue if the call should be passed over to a human operator. Such a critical phase can be indicated by the customer's vocal expression. Other studies prooved that it is possible to distinguish between anger and neutral speech with prosodic features alone. Subjects in these studies were mostly people acting or simulating emotions like anger. In this paper we use data from a so-called Wizard of Oz (WoZ) scenario to get more realistic data instead of simulated anger. As shown below, the classification rate for the two classes "emotion" (class E) and "neutral" (class :E) is signiftcantly worse for these more realistic data. Furthermore the classification results are heavily speaker dependent. Prosody alone might thus not be sufficient and has to be supplemented by the use of other knowledge sources such as the detection of repetitions, reformulations, swear words, and dialogue acts.

Full Paper

Bibliographic reference.  Huber, Richard / Batliner, Anton / Buckow, Jan / Nöth, Elmar / Warnke, Volker / Niemann, Heinrich (2000): "Recognition of emotion in a realistic dialogue scenario", In ICSLP-2000, vol.1, 665-668.