15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Speaker Diarization Using Gesture and Speech

Binyam Gebrekidan Gebre (1), Peter Wittenburg (1), Sebastian Drude (1), Marijn Huijbregts (2), Tom Heskes (2)

(1) MPI for Psycholinguistics, The Netherlands
(2) Radboud Universiteit Nijmegen, The Netherlands

We demonstrate how the problem of speaker diarization can be solved using both gesture and speaker parametric models. The novelty of our solution is that we approach the speaker diarization problem as a speaker recognition problem after learning speaker models from speech samples corresponding to gestures (the occurrence of gestures indicates the presence of speech and the location of gestures indicates the identity of the speaker). This new approach offers many advantages: comparable state-of-the-art performance, faster computation and more flexibility. In our implementation, parametric models are used to model speakers' voice and their gestures: more specifically, Gaussian mixture models are used to model the voice characteristics of each person and all persons, and gamma distributions are used to model gestural activity based on features extracted from Motion History Images. Tests on 4.24 hours of the AMI meeting data show that our solution makes DER score improvements of 19% on speech-only segments and 4% on all segments including silence (the comparison is with the AMI system).

Full Paper

Bibliographic reference.  Gebre, Binyam Gebrekidan / Wittenburg, Peter / Drude, Sebastian / Huijbregts, Marijn / Heskes, Tom (2014): "Speaker diarization using gesture and speech", In INTERSPEECH-2014, 582-586.