ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Emotion classification in children's speech using fusion of acoustic and linguistic features

Tim Polzehl, Shiva Sundaram, Hamed Ketabdar, Michael Wagner, Florian Metze

This paper describes a system to detect angry vs. non-angry utterances of children who are engaged in dialog with an Aibo robot dog. The system was submitted to the Interspeech2009 Emotion Challenge evaluation. The speech data consist of short utterances of the childrenÂ’s speech, and the proposed system is designed to detect anger in each given chunk. Frame-based cepstral features, prosodic and acoustic features as well as glottal excitation features are extracted automatically, reduced in dimensionality and classified by means of an artificial neural network and a support vector machine. An automatic speech recognizer transcribes the words in an utterance and yields a separate classification based on the degree of emotional salience of the words. Late fusion is applied to make a final decision on anger vs. non-anger of the utterance. Preliminary results show 75.9% unweighted average recall on the training data and 67.6% on the test set.


doi: 10.21437/Interspeech.2009-110

Cite as: Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., Metze, F. (2009) Emotion classification in children's speech using fusion of acoustic and linguistic features. Proc. Interspeech 2009, 340-343, doi: 10.21437/Interspeech.2009-110

@inproceedings{polzehl09_interspeech,
  author={Tim Polzehl and Shiva Sundaram and Hamed Ketabdar and Michael Wagner and Florian Metze},
  title={{Emotion classification in children's speech using fusion of acoustic and linguistic features}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={340--343},
  doi={10.21437/Interspeech.2009-110}
}