Traditionally, a simultaneous recognition process using the same feature set of a spoken utterance is used to classify the emotional state of the speaker in addition to its content. However, an analysis on the classification performance for every pair of emotions shows that different features have distinctive classification abilities for different emotions. Therefore, we propose an efficient emotion recognition process called cascade bisection (CB-process), which carries out emotion recognition by means of several bisecting steps and applies different feature sets for every step. This process is based on the featuresÂ’ different abilities of classifying emotions. Through this, we can fully utilize the information extracted from features and achieve a better recognition performance. Five discrete emotional states, namely, neutral, anger, fear, joy, and sadness are distinguished from the input Mandarin speech. After extracting the acoustic features that contain information on short-time energy (amplitude), signal amplitude, and pitch, we derive the representation feature set for further use in the CB-process, which achieves better emotion recognition as demonstrated seen from the experimental results.
Cite as: Zhang, S., Ching, P.C., Kong, F. (2006) Automatic emotion recognition of speech signal in Mandarin. Proc. Interspeech 2006, paper 1128-Wed2BuP.6, doi: 10.21437/Interspeech.2006-500
@inproceedings{zhang06g_interspeech, author={Sheng Zhang and P. C. Ching and Fanrang Kong}, title={{Automatic emotion recognition of speech signal in Mandarin}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1128-Wed2BuP.6}, doi={10.21437/Interspeech.2006-500} }