ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Amplitude modulation features for emotion recognition from speech

Md. Jahangir Alam, Yazid Attabi, Pierre Dumouchel, Patrick Kenny, Douglas O'Shaughnessy

The goal of speech emotion recognition (SER) is to identify the emotional or physical state of a human being from his or her voice. One of the most important things in a SER task is to extract and select relevant speech features with which most emotions could be recognized. In this paper, we present a smoothed nonlinear energy operator (SNEO)-based amplitude modulation cepstral coefficients (AMCC) feature for recognizing emotions from speech signals. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separation algorithm (ESA). AMCC features are obtained by first decomposing a speech signal using a C-channel gammatone filterbank, computing the AM power spectrum, and taking a discrete cosine transform (DCT) of the root compressed AM power spectrum. Conventional MFCC (Mel-frequency cepstral coefficients) and Mel-warped DFT (discrete Fourier transform) spectrum based cepstral coefficients (MWDCC) features are used for comparing the recognition performances of the proposed features. Emotion recognition experiments are conducted on the FAU AIBO spontaneous emotion corpus. It is observed from the experimental results that the AMCC features provide a relative improvement of approximately 3.5% over the baseline MFCC.


doi: 10.21437/Interspeech.2013-563

Cite as: Alam, M.J., Attabi, Y., Dumouchel, P., Kenny, P., O'Shaughnessy, D. (2013) Amplitude modulation features for emotion recognition from speech. Proc. Interspeech 2013, 2420-2424, doi: 10.21437/Interspeech.2013-563

@inproceedings{alam13c_interspeech,
  author={Md. Jahangir Alam and Yazid Attabi and Pierre Dumouchel and Patrick Kenny and Douglas O'Shaughnessy},
  title={{Amplitude modulation features for emotion recognition from speech}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2420--2424},
  doi={10.21437/Interspeech.2013-563}
}