We present a novel speaker diarization method by using eye-gaze information in multi-party conversations. In real environments, speaker diarization or speech activity detection of each participant of the conversation is challenging because of distant talking and ambient noise. In contrast, eye-gaze information is robust against acoustic degradation, and it is presumed that eye-gaze behavior plays an important role in turn-taking and thus in predicting utterances. The proposed method stochastically integrates eye-gaze information with acoustic information for speaker diarization. Specifically, three models are investigated for multi-modal integration in this paper. Experimental evaluations in real poster sessions demonstrate that the proposed method improves accuracy of speaker diarization from the baseline acoustic method.
Bibliographic reference. Inoue, Koji / Wakabayashi, Yukoh / Yoshimoto, Hiromasa / Kawahara, Tatsuya (2014): "Speaker diarization using eye-gaze information in multi-party conversations", In INTERSPEECH-2014, 562-566.