Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Combining Speaker Identification and BIC for Speaker Diarization

Xuan Zhu (1), Claude Barras (1), Sylvain Meignier (2), Jean-Luc Gauvain (1)

(1) LIMSI-CNRS, Orsay, France; (2) LIUM-CNRS, France

This paper describes recent advances in speaker diarization by incorporating a speaker identification step. This system builds upon the LIMSI baseline data partitioner used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters, when there is a large quantity of data for the speaker. Several improvements to the baseline system have been made. Firstly, a standard Bayesian information criterion (BIC) agglomerative clustering has been integrated replacing the iterative Gaussian mixture model (GMM) clustering. Then a second clustering stage has been added, using a speaker identification method with MAP adapted GMM. A final post-processing stage refines the segment boundaries using the output of the transcription system. On the RT-04f and ESTER evaluation data, the improved multi-stage system provides between 40% and 50% reduction of the speaker error, relative to a standard BIC clustering system.

Full Paper

Bibliographic reference.  Zhu, Xuan / Barras, Claude / Meignier, Sylvain / Gauvain, Jean-Luc (2005): "Combining speaker identification and BIC for speaker diarization", In INTERSPEECH-2005, 2441-2444.