Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?

Johannes Wagner, Dominik Schiller, Andreas Seiderer, Elisabeth André


In the past, the performance of machine learning algorithms depended heavily on the representation of the data. Well-designed features therefore played a key role in speech and paralinguistic recognition tasks. Consequently, engineers have put a great deal of work into manually designing large and complex acoustic feature sets. With the emergence of Deep Neural Networks (DNNs), however, it is now possible to automatically infer higher abstractions from simple spectral representations or even learn directly from raw waveforms. This raises the question if (complex) hand-crafted features will still be needed in the future. We take this year's INTERSPEECH Computational Paralinguistic Challenge as an opportunity to approach this issue by means of two corpora - Atypical Affect and Crying. At first, we train a Recurrent Neural Network (RNN) to evaluate the performance of several hand-crafted feature sets of varying complexity. Afterwards, we make the network do the feature engineering all on its own by prefixing a stack of convolutional layers. Our results show that there is no clear winner (yet). This creates room to discuss chances and limits of either approach.


 DOI: 10.21437/Interspeech.2018-1238

Cite as: Wagner, J., Schiller, D., Seiderer, A., André, E. (2018) Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?. Proc. Interspeech 2018, 147-151, DOI: 10.21437/Interspeech.2018-1238.


@inproceedings{Wagner2018,
  author={Johannes Wagner and Dominik Schiller and Andreas Seiderer and Elisabeth André},
  title={Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={147--151},
  doi={10.21437/Interspeech.2018-1238},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1238}
}