Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions

Bekir Berker Türker, Engin Erzin, Yücel Yemez, Metin Sezgin


Head-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. In this study, we present an audio-visual prediction framework for the head-nod and turn-taking events that can also be utilized in real-time systems. Prediction systems based on Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are trained on human-human conversational data. Unimodal and multimodal classification performances of head-nod and turn-taking events are reported over the IEMOCAP dataset.


 DOI: 10.21437/Interspeech.2018-2215

Cite as: Türker, B.B., Erzin, E., Yemez, Y., Sezgin, M. (2018) Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions. Proc. Interspeech 2018, 1741-1745, DOI: 10.21437/Interspeech.2018-2215.


@inproceedings{Türker2018,
  author={Bekir Berker Türker and Engin Erzin and Yücel Yemez and Metin Sezgin},
  title={Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1741--1745},
  doi={10.21437/Interspeech.2018-2215},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2215}
}