In this work, we study the paralinguistic speech task of eating condition classification and present our submitted classification system for the INTERSPEECH 2015 Computational Paralinguistics challenge. We build upon a deep learning language identification system, which we repurpose for general audio sequence classification. The main idea is that we train local convolutional neural network classifiers that automatically learn representations on smaller windows of the full sequence's spectrum and to aggregate multiple local classifications towards a full sequence classification. A particular challenge of the task is training data scarcity and the resulting overfitting of neural network methods, which we tackle with dropout, synthetic data augmentation and transfer learning with out-of-domain data from a language identification task. Our final submitted system achieved an UAR score of 75.9% for 7-way eating condition classification, which is a relative improvement of 15% over the baseline.
Bibliographic reference. Milde, Benjamin / Biemann, Chris (2015): "Using representation learning and out-of-domain data for a paralinguistic speech task", In INTERSPEECH-2015, 904-908.