This paper focuses on speech based emotion classification utilizing acoustic data. The most commonly used acoustic features are pitch and energy, along with prosodic information like the rate of speech. We propose the use of a novel feature based on the phase response of an all-pole model of the vocal tract obtained from linear predictive coefficients (LPC), in addition to the aforementioned features. We compare this feature to other commonly used acoustic features based on classification accuracy. The back-end of our system employs a probabilistic neural network based classifier. Evaluations conducted on the LDC Emotional Prosody speech corpus indicate the proposed features are well suited to the task of emotion classification. The proposed features are able to provide a relative increase in classification accuracy of about 14% over established features when combined with them to form a larger feature vector.
Bibliographic reference. Sethu, Vidhyasaharan / Ambikairajah, Eliathamby / Epps, Julien (2007): "Group delay features for emotion detection", In INTERSPEECH-2007, 2273-2276.