15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Speaker Diarization Using Eye-gaze Information in Multi-Party Conversations

Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Tatsuya Kawahara

Kyoto University, Japan

We present a novel speaker diarization method by using eye-gaze information in multi-party conversations. In real environments, speaker diarization or speech activity detection of each participant of the conversation is challenging because of distant talking and ambient noise. In contrast, eye-gaze information is robust against acoustic degradation, and it is presumed that eye-gaze behavior plays an important role in turn-taking and thus in predicting utterances. The proposed method stochastically integrates eye-gaze information with acoustic information for speaker diarization. Specifically, three models are investigated for multi-modal integration in this paper. Experimental evaluations in real poster sessions demonstrate that the proposed method improves accuracy of speaker diarization from the baseline acoustic method.

Full Paper

Bibliographic reference.  Inoue, Koji / Wakabayashi, Yukoh / Yoshimoto, Hiromasa / Kawahara, Tatsuya (2014): "Speaker diarization using eye-gaze information in multi-party conversations", In INTERSPEECH-2014, 562-566.