Interspeech'2005 - Eurospeech
This paper describes recent advances in speaker diarization by incorporating a speaker identification step. This system builds upon the LIMSI baseline data partitioner used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters, when there is a large quantity of data for the speaker. Several improvements to the baseline system have been made. Firstly, a standard Bayesian information criterion (BIC) agglomerative clustering has been integrated replacing the iterative Gaussian mixture model (GMM) clustering. Then a second clustering stage has been added, using a speaker identification method with MAP adapted GMM. A final post-processing stage refines the segment boundaries using the output of the transcription system. On the RT-04f and ESTER evaluation data, the improved multi-stage system provides between 40% and 50% reduction of the speaker error, relative to a standard BIC clustering system.
Bibliographic reference. Zhu, Xuan / Barras, Claude / Meignier, Sylvain / Gauvain, Jean-Luc (2005): "Combining speaker identification and BIC for speaker diarization", In INTERSPEECH-2005, 2441-2444.