ISCA Archive SpeechProsody 2010
ISCA Archive SpeechProsody 2010

Classification of affective speech using normalized time-frequency cepstra

D. Neiberg, P. Laukka, G. Ananthakrishnan

Subtle temporal and spectral differences between categorical realizations of para-linguistic phenomena (e.g., affective vocal expressions) are hard to capture and describe. In this paper we present a signal representation based on Time Varying Constant-Q Cepstral Coefficients (TVCQCC) derived for this purpose. A method which utilizes the special properties of the constant Q-transform for mean F0 estimation and normalization is described. The coefficients are invariant to segment length, and as a special case, a representation for prosody is considered. Speaker independent classification results using &# 23;-SVM with the Berlin EMO-DB and two closed sets of basic (anger, disgust, fear, happiness, sadness, neutral) and social/interpersonal (affection, pride, shame) emotions recorded by forty professional actors from two English dialect areas are reported. The accuracy for the Berlin EMO-DB is 71.2 %, and the accuracies for the first set including basic emotions was 44.6% and for the second set including basic and social emotions the accuracy was 31.7%. It was found that F0 normalization boosts the performance and a combined feature set shows the best performance.

Index Terms: Emotion Classification, Constant-Q, 2D-DCT, supra-segmental, mean pitch estimation, prosody

Cite as: Neiberg, D., Laukka, P., Ananthakrishnan, G. (2010) Classification of affective speech using normalized time-frequency cepstra. Proc. Speech Prosody 2010, paper 071

  author={D. Neiberg and P. Laukka and G. Ananthakrishnan},
  title={{Classification of affective speech using normalized time-frequency cepstra}},
  booktitle={Proc. Speech Prosody 2010},
  pages={paper 071}