![]() |
ITRW on Speech and EmotionSeptember 5-7, 2000 |
![]() |
Automatic dialogue systems used in call-centers, for instance, should be able to determine in a critical phase of the dialogue - indicated by the costumers vocal expression of anger/irritation - when it is better to pass over to a human operator. At a first glance, this seems not to be a com- plicated task: It is reported in the literature that emotions can be told apart quite reliably on the basis of prosodic features. However, these results are most of the time achieved in a laboratory setting, with experienced speakers (actors), and with elicited, controlled speech. We report classification results obtained within different experimental settings for the two-class-problem "&neutral vs. anger"& using a vector of prosodic features and discuss the impact of single features on the classification rate. Recognition rates for these settings are best for a speaker-specific classifier (one experienced speaker, acting), worse for a speaker-independent classifier (several less experienced speakers, reading), and even worse for a speaker-independent classifier with naive subjects performing the task of appointment scheduling in a Wizard-of-Oz-scenario where a malfunctioning system is simulated in order to evoke anger. The first situation mirrors most of the settings reported in the literature, the third is closest to the "real-life"-task. It thus turns out that prosody alone is not reliable as an indicator of the speakers emotional state the closer we get to a realistic scenario. As a consequence, the prosodic classifier was combined with other knowledge sources in the module Monitoring Of User State [especially of] Emotion (MoUSE).
Bibliographic reference. Batliner, Anton / Fischer, K. / Huber, R. / Spilker, Jörg / Nöth, Elmar (2000): "Desperately seeking emotions or: Actors, wizards, and human beings", In SpeechEmotion-2000, 195-200.