Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective

Raymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn Schuller


The automatic detection and classification of social signals is an important task, given the fundamental role nonverbal behavioral cues play in human communication. We present the first cross-lingual study on the detection of laughter and fillers in conversational and spontaneous speech collected ‘in the wild’ over IP (internet protocol). Further, this is the first comparison of LSTM and GRU networks to shed light on their performance differences. We report frame-based results in terms of the unweighted-average area-under-the-curve (UAAUC) measure and will shortly discuss its suitability for this task. In the mono-lingual setup our best deep BLSTM system achieves 87.0% and 86.3% UAAUC for English and German, respectively. Interestingly, the cross-lingual results are only slightly lower, yielding 83.7% for a system trained on English, but tested on German, and 85.0% in the opposite case. We show that LSTM and GRU architectures are valid alternatives for e. g., on-line and compute-sensitive applications, since their application incurs a relative UAAUC decrease of only approximately 5% with respect to our best systems. Finally, we apply additional smoothing to correct for erroneous spikes and drops in the posterior trajectories to obtain an additional gain in all setups.


 DOI: 10.21437/Interspeech.2017-635

Cite as: Brueckner, R., Schmitt, M., Pantic, M., Schuller, B. (2017) Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective. Proc. Interspeech 2017, 2371-2375, DOI: 10.21437/Interspeech.2017-635.


@inproceedings{Brueckner2017,
  author={Raymond Brueckner and Maximilian Schmitt and Maja Pantic and Björn Schuller},
  title={Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2371--2375},
  doi={10.21437/Interspeech.2017-635},
  url={http://dx.doi.org/10.21437/Interspeech.2017-635}
}