ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences

Jose M. Pardo, Xavier Anguera, Chuck Wooters

Speaker diarization for recordings made in meetings consists of identifying the number of participants in each meeting and creating a list of speech time intervals for each participant. In recently published work [7] we presented some experiments using only TDOA values (Time Delay Of Arrival for different channels) applied to this task. We demonstrated that information in those values can be used to segment the speakers. In this paper we have developed a method to mix the TDOA values with the acoustic values by calculating a combined log-likelihood between both sets of vectors. Using this method we have been able to reduce the DER by 16.34% (relative) for the NIST RT05s set (scored without overlap and manually transcribed references) the DER for our devel06s set (scored with overlap and force-aligned references) by 21% (relative) and the DER for the NIST RT06s (scored with overlap and manually transcribed references) by 15% (relative).


doi: 10.21437/Interspeech.2006-570

Cite as: Pardo, J.M., Anguera, X., Wooters, C. (2006) Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences. Proc. Interspeech 2006, paper 1337-Thu1A1O.5, doi: 10.21437/Interspeech.2006-570

@inproceedings{pardo06_interspeech,
  author={Jose M. Pardo and Xavier Anguera and Chuck Wooters},
  title={{Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1337-Thu1A1O.5},
  doi={10.21437/Interspeech.2006-570}
}