16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Using Representation Learning and Out-of-Domain Data for a Paralinguistic Speech Task

Benjamin Milde, Chris Biemann

Technische Universität Darmstadt, Germany

In this work, we study the paralinguistic speech task of eating condition classification and present our submitted classification system for the INTERSPEECH 2015 Computational Paralinguistics challenge. We build upon a deep learning language identification system, which we repurpose for general audio sequence classification. The main idea is that we train local convolutional neural network classifiers that automatically learn representations on smaller windows of the full sequence's spectrum and to aggregate multiple local classifications towards a full sequence classification. A particular challenge of the task is training data scarcity and the resulting overfitting of neural network methods, which we tackle with dropout, synthetic data augmentation and transfer learning with out-of-domain data from a language identification task. Our final submitted system achieved an UAR score of 75.9% for 7-way eating condition classification, which is a relative improvement of 15% over the baseline.

Full Paper

Bibliographic reference.  Milde, Benjamin / Biemann, Chris (2015): "Using representation learning and out-of-domain data for a paralinguistic speech task", In INTERSPEECH-2015, 904-908.