ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition

Yuyun Huang, Emer Gilmartin, Nick Campbell

Conversational engagement is a multimodal phenomenon and an essential cue to assess both human-human and human-robot communication. Speaker-dependent and speaker-independent scenarios were addressed in our engagement study. Handcrafted audio-visual features were used. Fixed window sizes for feature fusion method were analysed. Novel dynamic window size selection and multimodal bi-directional long short term memory (Multimodal BLSTM) approaches were proposed and evaluated for engagement level recognition.


doi: 10.21437/Interspeech.2017-1496

Cite as: Huang, Y., Gilmartin, E., Campbell, N. (2017) Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition. Proc. Interspeech 2017, 3359-3363, doi: 10.21437/Interspeech.2017-1496

@inproceedings{huang17f_interspeech,
  author={Yuyun Huang and Emer Gilmartin and Nick Campbell},
  title={{Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3359--3363},
  doi={10.21437/Interspeech.2017-1496}
}