In this paper, we propose a method for automatically detecting various types of snore sounds using image classification convolutional neural network (CNN) descriptors extracted from audio file spectrograms. The descriptors, denoted as deep spectrum features, are derived from forwarding spectrograms through very deep task-independent pre-trained CNNs. Specifically, activations of fully connected layers from two common image classification CNNs, AlexNet and VGG19, are used as feature vectors. Moreover, we investigate the impact of differing spectrogram colour maps and two CNN architectures on the performance of the system. Results presented indicate that deep spectrum features extracted from the activations of the second fully connected layer of AlexNet using a viridis colour map are well suited to the task. This feature space, when combined with a support vector classifier, outperforms the more conventional knowledge-based features of 6 373 acoustic functionals used in the INTERSPEECH ComParE 2017 Snoring sub-challenge baseline system. In comparison to the baseline, unweighted average recall is increased from 40.6% to 44.8% on the development partition, and from 58.5% to 67.0% on the test partition.
Cite as: Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Freitag, M., Pugachevskiy, S., Baird, A., Schuller, B. (2017) Snore Sound Classification Using Image-Based Deep Spectrum Features. Proc. Interspeech 2017, 3512-3516, doi: 10.21437/Interspeech.2017-434
@inproceedings{amiriparian17_interspeech, author={Shahin Amiriparian and Maurice Gerczuk and Sandra Ottl and Nicholas Cummins and Michael Freitag and Sergey Pugachevskiy and Alice Baird and Björn Schuller}, title={{Snore Sound Classification Using Image-Based Deep Spectrum Features}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3512--3516}, doi={10.21437/Interspeech.2017-434} }