Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Emotion Recognition in Speech Signal: Experimental Study, Development, and Application

Valery A. Petrushin

Center for Strategic Technology Research (CSTaR), Andersen Consulting, Northbrook, IL, USA

The paper describes an experimental study on vocal emotion expression and recognition and the development of a computer agent for emotion recognition. The study deals with a corpus of 700 short utterances expressing five emotions: happiness, anger, sadness, fear, and normal (unemotional) state, which were portrayed by thirty subjects. The utterances were evaluated by twenty three subjects, twenty of whom participated in recording. The accuracy of recognition emotions in speech is the following: happiness - 61.4%, anger - 72.2%, sadness - 68.3%, fear - 49.5%, and normal - 66.3%. The human ability to portray emotions is approximately at the same level (happiness - 59.8%, anger - 71.7%, sadness - 68.1%, fear - 49.7%, and normal - 65.1%), but the standard deviation is much larger. The human ability to recognize their own emotions has been also evaluated. It turned out that people are good in recognition anger (98.1%), sadness (80%) and fear (78.8%), but are less confident for normal state (71.9%) and happiness (71.2%). A part of the corpus was used for extracting features and training computer based recognizers. Some statistics of the pitch, the first and second formants, energy and the speaking rate were selected and several types of recognizers were created and compared. The best results were obtained using the ensembles of neural network recognizers, which demonstrated the following accuracy: normal state - 55-75%, happiness - 60-70%, anger - 70-80%, sadness - 75-85%, and fear - 35-55%. The total average accuracy is about 70%. An emotion recognition agent was created that is able to analyze telephone quality speech signal and distinguish between two emotional states --"agitation" and "calm" -- with the accuracy of 77%. The agent was used as a part of a decision support system for prioritizing voice messages and assigning a proper human agent to response the message at call center environment. The architecture of the system is presented and discussed.


Full Paper

Bibliographic reference.  Petrushin, Valery A. (2000): "Emotion recognition in speech signal: experimental study, development, and application", In ICSLP-2000, vol.2, 222-225.