Employing Bottleneck and Convolutional Features for Speech-Based Physical Load Detection on Limited Data Amounts

Olga Egorow, Tarik Mrech, Norman Weißkirchen, Andreas Wendemuth


The detection of different levels of physical load from speech has many applications: Besides telemedicine, non-contact detection of certain heart rate ranges can be useful for sports and other leisure time devices. Available approaches mainly use a high number of spectral and prosodic features. In this setting of typically small data sets, such as the Talk & Run data set and the Munich Biovoice Corpus, the high-dimensional feature spaces are only sparsely populated. Therefore, we aim at a reduction of the feature number using modern neural net inspired features: Bottleneck layer features, obtained from standard low-level descriptors via a feed-forward neural network, and activation map features, obtained from spectrograms via a convolutional neural network. We use these features for an SVM classification of high and low physical load and compare their performance. We also discuss the possibility of hyperparameter transfer of the extracting networks between different data sets. We show that even for limited amounts of data, deep learning based methods can bring a substantial improvement over “conventional” features.


 DOI: 10.21437/Interspeech.2019-2502

Cite as: Egorow, O., Mrech, T., Weißkirchen, N., Wendemuth, A. (2019) Employing Bottleneck and Convolutional Features for Speech-Based Physical Load Detection on Limited Data Amounts. Proc. Interspeech 2019, 1666-1670, DOI: 10.21437/Interspeech.2019-2502.


@inproceedings{Egorow2019,
  author={Olga Egorow and Tarik Mrech and Norman Weißkirchen and Andreas Wendemuth},
  title={{Employing Bottleneck and Convolutional Features for Speech-Based Physical Load Detection on Limited Data Amounts}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1666--1670},
  doi={10.21437/Interspeech.2019-2502},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2502}
}