This paper reports experiments on Eating Condition (EC) classification in the context of the INTERSPEECH 2015 Paralinguistic EC sub-challenge. Several techniques were compared: Support Vector Machines, Softmax classifiers and single hidden-layer neural nets using the ReLu activation function. Although eating noise and speech overlap in the recordings most of the time, performance improvements were obtained with all the tested techniques, by using the baseline features augmented with the same features but extracted on audio frames with low energy only. This led to a total of 12K features. With the Softmax classifier, for instance, UAR increased from 58.3% to 64.3% in the Leave-One-Speaker-Out (LOSO) cross-validation configuration. As expected, the `Biscuit' and `Crisp' categories benefited the most from using low-energy frames, with UAR improvements between 10% and 15% absolute. Indeed, these noises are high-frequency noises with low energy. SVM and Softmax showed similar performance, with Softmax slightly outperforming SVMs. Our best performance of 68.4% UAR on the test set was obtained by averaging the scores of several neural nets trained in the LOSO configuration. We also report a performance comparison of three different weight update rules used with batch gradient descent: the sgd, momentum and rmsprop rules.
Bibliographic reference. Pellegrini, Thomas (2015): "Comparing SVM, softmax, and shallow neural networks for eating condition classification", In INTERSPEECH-2015, 899-903.