State of Mind Sensing from Speech: State of Matters and What Matters

Björn Schuller

The sensing of a plethora of speaker states and traits has entered a new era of "in the wild processing" largely empowered by deep learning. This includes the self-learning of feature representations by convolutional neural networks directly from the raw audio signal. In addition, examples of feature vectors or even raw audio material on the signal level itself can increasingly be generated by generative adversarial neural networks allowing to self-synthesise additional training material. Put together with memory-enhanced recurrent neural networks such as with long-short term memory or gated recurrent units, powerful architectures can be built to model a broad range of speaker characteristics - ideally side-by-side in a multi-target training framework to allow for exploitation of mutual dependencies. This puts a new question in optimal modeling to the fore: How should topologies of such networks best be shaped that serve different tasks in coupled ways able to self-learn representation and even imagine learning examples? Automatic Machine Learning offers solutions to this end enabling the self-optimisation of such network topologies by controller nets that iteratively shape child nets reinforced by the task performance. In this overview on the state of mind sensing from speech, examples from the Interspeech Computational Paralinguistics Challenge series (ComParE) serve to illustrate the richness of speaker characteristics that can be assessed automatically at present. Further, these are used to exemplify the sketched latest development in the field from a technical viewpoint in terms of "what matters“. This is further elaborated upon in an outlook on which are the crucial next steps to be taken to allow for human or super-human performance in challenging real-world conditions.

Cite as: Schuller, B. (2018) State of Mind Sensing from Speech: State of Matters and What Matters. Proc. Workshop on Speech, Music and Mind 2018.

  author={Björn Schuller},
  title={State of Mind Sensing from Speech: State of Matters and What Matters},
  booktitle={Proc. Workshop on Speech, Music and Mind 2018}