ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations

Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Katsuya Takanashi, Tatsuya Kawahara

We propose multi-modal speaker diarization using acoustic and eye-gaze information in poster conversations. Eye-gaze information plays an important role in turn-taking, thus it is useful for predicting speech activity. In this paper, a variety of eye-gaze features are elaborated and combined with the acoustic information by the multi-modal integration model. Moreover, we introduce another model to detect backchannels, which involve different eye-gaze behaviors. This enhances the diarization result by filtering meaningful utterances such as questions and comments. Experimental evaluations in real poster sessions demonstrate that eye-gaze information contributes to improvement of diarization accuracy under noisy environments, and its weight is automatically determined according to the Signal-to-Noise Ratio (SNR).


doi: 10.21437/Interspeech.2015-107

Cite as: Inoue, K., Wakabayashi, Y., Yoshimoto, H., Takanashi, K., Kawahara, T. (2015) Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations. Proc. Interspeech 2015, 3086-3090, doi: 10.21437/Interspeech.2015-107

@inproceedings{inoue15_interspeech,
  author={Koji Inoue and Yukoh Wakabayashi and Hiromasa Yoshimoto and Katsuya Takanashi and Tatsuya Kawahara},
  title={{Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={3086--3090},
  doi={10.21437/Interspeech.2015-107}
}