INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Emotional Speech: A Spectral Analysis

Pouria Fewzee, Fakhri Karray

Centre for Patter Analysis and Machine Intelligence, University of Waterloo, Waterloo, ON, Canada

Feature extraction and dimensionality reduction may be found as the most imperative parts of the emotional speech recognition problem. In this work, we propose a new set of speech features, based on the distribution of energy in frequency domain. To investigate the applicability of the proposed model, we have set the first international audio/visual emotion challenge (AVEC 2011) as the benchmark. As for the modeling and dimensionality reduction, we have employed the lasso. It is shown how 15 explicit spectral energy features, as suggested in this work, can lead to a more accurate model than those of all the participants in the audio sub-challenge. This is while this number of features is less than ten percent of the smallest set of features participated in the challenge. Centre for Patter Analysis and Machine Intelligence,

Index Terms: emotional speech recognition, feature extraction, dimensionality reduction

Full Paper

Bibliographic reference.  Fewzee, Pouria / Karray, Fakhri (2012): "Emotional speech: a spectral analysis", In INTERSPEECH-2012, 2238-2241.