10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Speaker Diarization Using Divide-and-Conquer

Shih-Sian Cheng (1), Chun-Han Tseng (2), Chia-Ping Chen (2), Hsin-Min Wang (1)

(1) Academia Sinica, Taiwan
(2) National Sun Yat-Sen University, Taiwan

Speaker diarization systems usually consist of two core components: speaker segmentation and speaker clustering. The current state-of-the-art speaker diarization systems usually apply hierarchical agglomerative clustering (HAC) for speaker clustering after segmentation. However, HACís quadratic computational complexity with respect to the number of data samples inevitably limits its application in large-scale data sets. In this paper, we propose a divide-and-conquer (DAC) framework for speaker diarization. It recursively partitions the input speech stream into two sub-streams, performs diarization on them separately, and then combines the diarization results obtained from them using HAC. The results of experiments conducted on RT-02 and RT-03 broadcast news data show that the proposed framework is faster than the conventional segmentation and clustering-based approach while achieving comparable diarization accuracy. Moreover, the proposed framework obtains a higher speedup over the conventional approach on a larger test data set.

Full Paper

Bibliographic reference.  Cheng, Shih-Sian / Tseng, Chun-Han / Chen, Chia-Ping / Wang, Hsin-Min (2009): "Speaker diarization using divide-and-conquer", In INTERSPEECH-2009, 1055-1058.