16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Enhanced Speaker Diarization with Detection of Backchannels Using Eye-Gaze Information in Poster Conversations

Koji Inoue (1), Yukoh Wakabayashi (2), Hiromasa Yoshimoto (1), Katsuya Takanashi (1), Tatsuya Kawahara (1)

(1) Kyoto University, Japan
(2) Ritsumeikan University, Japan

We propose multi-modal speaker diarization using acoustic and eye-gaze information in poster conversations. Eye-gaze information plays an important role in turn-taking, thus it is useful for predicting speech activity. In this paper, a variety of eye-gaze features are elaborated and combined with the acoustic information by the multi-modal integration model. Moreover, we introduce another model to detect backchannels, which involve different eye-gaze behaviors. This enhances the diarization result by filtering meaningful utterances such as questions and comments. Experimental evaluations in real poster sessions demonstrate that eye-gaze information contributes to improvement of diarization accuracy under noisy environments, and its weight is automatically determined according to the Signal-to-Noise Ratio (SNR).

Full Paper

Bibliographic reference.  Inoue, Koji / Wakabayashi, Yukoh / Yoshimoto, Hiromasa / Takanashi, Katsuya / Kawahara, Tatsuya (2015): "Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations", In INTERSPEECH-2015, 3086-3090.