Speaker diarization systems usually consist of two core components: speaker segmentation and speaker clustering. The current state-of-the-art speaker diarization systems usually apply hierarchical agglomerative clustering (HAC) for speaker clustering after segmentation. However, HACís quadratic computational complexity with respect to the number of data samples inevitably limits its application in large-scale data sets. In this paper, we propose a divide-and-conquer (DAC) framework for speaker diarization. It recursively partitions the input speech stream into two sub-streams, performs diarization on them separately, and then combines the diarization results obtained from them using HAC. The results of experiments conducted on RT-02 and RT-03 broadcast news data show that the proposed framework is faster than the conventional segmentation and clustering-based approach while achieving comparable diarization accuracy. Moreover, the proposed framework obtains a higher speedup over the conventional approach on a larger test data set.
Bibliographic reference. Cheng, Shih-Sian / Tseng, Chun-Han / Chen, Chia-Ping / Wang, Hsin-Min (2009): "Speaker diarization using divide-and-conquer", In INTERSPEECH-2009, 1055-1058.