7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Combining Acoustic and Language Information for Emotion Recognition

Chul Min Lee (1), Shrikanth S. Narayanan (1), Roberto Pieraccini (2)

(1) University of Southern California, USA; (2) SpeechWorks International, USA

This paper reports on emotion recognition using both acoustic and language information in spoken utterances. So far, most previous efforts have focused on emotion recognition using acoustic correlates although it is well known that language information also conveys emotions. For capturing emotional information at the language level, we introduce the information-theoretic notion of ‘emotional salience’. For acoustic information, linear discriminant classifiers and k-nearest neighborhood classifiers were used in the emotion classi- fication. The combination of acoustic and linguistic information is posed as a data fusion problem to obtain the combined decision. Results using spoken dialog data obtained from a telephone-based human-machine interaction application show that combining acoustic and language information improves negative emotion classification by 45.7% (linear discriminant classifier used for acoustic information) and 32.9%, respectively, over using only acoustic and language information.


Full Paper

Bibliographic reference.  Lee, Chul Min / Narayanan, Shrikanth S. / Pieraccini, Roberto (2002): "Combining acoustic and language information for emotion recognition", In ICSLP-2002, 873-876.