Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Recognition of Emotion in a Realistic Dialogue Scenario
Richard Huber, Anton Batliner, Jan Buckow, Elmar Nöth, Volker Warnke, Heinrich Niemann
Chair for Pattern Recognition,
University of Erlangen-Nuremberg, Germany
Nowadays modern automatic dialogue systems are able to
understand complex sentences instead of only a few
commands like Stop or No. In a call-center, such a system
should be able to determine in a critical phase of the
dialogue if the call should be passed over to a human operator.
Such a critical phase can be indicated by the customer's
vocal expression. Other studies prooved that it is possible
to distinguish between anger and neutral speech with
prosodic features alone. Subjects in these studies were
mostly people acting or simulating emotions like anger.
In this paper we use data from a so-called Wizard of Oz
(WoZ) scenario to get more realistic data instead of
simulated anger. As shown below, the classification rate for the
two classes "emotion" (class E) and "neutral" (class :E) is
signiftcantly worse for these more realistic data. Furthermore
the classification results are heavily speaker dependent.
Prosody alone might thus not be sufficient and has
to be supplemented by the use of other knowledge sources
such as the detection of repetitions, reformulations, swear
words, and dialogue acts.
Huber, Richard / Batliner, Anton / Buckow, Jan / Nöth, Elmar / Warnke, Volker / Niemann, Heinrich (2000):
"Recognition of emotion in a realistic dialogue scenario",
In ICSLP-2000, vol.1, 665-668.