ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Combining speaker identification and BIC for speaker diarization

Xuan Zhu, Claude Barras, Sylvain Meignier, Jean-Luc Gauvain

This paper describes recent advances in speaker diarization by incorporating a speaker identification step. This system builds upon the LIMSI baseline data partitioner used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters, when there is a large quantity of data for the speaker. Several improvements to the baseline system have been made. Firstly, a standard Bayesian information criterion (BIC) agglomerative clustering has been integrated replacing the iterative Gaussian mixture model (GMM) clustering. Then a second clustering stage has been added, using a speaker identification method with MAP adapted GMM. A final post-processing stage refines the segment boundaries using the output of the transcription system. On the RT-04f and ESTER evaluation data, the improved multi-stage system provides between 40% and 50% reduction of the speaker error, relative to a standard BIC clustering system.

doi: 10.21437/Interspeech.2005-651

Cite as: Zhu, X., Barras, C., Meignier, S., Gauvain, J.-L. (2005) Combining speaker identification and BIC for speaker diarization. Proc. Interspeech 2005, 2441-2444, doi: 10.21437/Interspeech.2005-651

  author={Xuan Zhu and Claude Barras and Sylvain Meignier and Jean-Luc Gauvain},
  title={{Combining speaker identification and BIC for speaker diarization}},
  booktitle={Proc. Interspeech 2005},