This paper reports on work carried out at the 2008 JHU Summer Workshop examining new approaches to speaker diarization. Four different systems were developed and experiments were conducted using summed-channel telephone data from the 2008 NIST SRE. The systems are a baseline agglomerative clustering system, a new Variational Bayes system using eigenvoice speaker models, a streaming system using a mix of low dimensional speaker factors and classic segmentation and clustering, and a new hybrid system combining the baseline system with a new cosine-distance speaker factor clustering. Results are presented using the Diarization Error Rate as well as by the EER when using diarization outputs for a speaker detection task. The best configurations of the diarization system produced DERs of 3.5-4.6% and we demonstrate a weak correlation of EER and DER.
Cite as: Reynolds, D., Kenny, P., Castaldo, F. (2009) A study of new approaches to speaker diarization. Proc. Interspeech 2009, 1047-1050, doi: 10.21437/Interspeech.2009-322
@inproceedings{reynolds09_interspeech, author={Douglas Reynolds and Patrick Kenny and Fabio Castaldo}, title={{A study of new approaches to speaker diarization}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={1047--1050}, doi={10.21437/Interspeech.2009-322} }