ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Speaker diarization using divide-and-conquer

Shih-Sian Cheng, Chun-Han Tseng, Chia-Ping Chen, Hsin-Min Wang

Speaker diarization systems usually consist of two core components: speaker segmentation and speaker clustering. The current state-of-the-art speaker diarization systems usually apply hierarchical agglomerative clustering (HAC) for speaker clustering after segmentation. However, HAC’s quadratic computational complexity with respect to the number of data samples inevitably limits its application in large-scale data sets. In this paper, we propose a divide-and-conquer (DAC) framework for speaker diarization. It recursively partitions the input speech stream into two sub-streams, performs diarization on them separately, and then combines the diarization results obtained from them using HAC. The results of experiments conducted on RT-02 and RT-03 broadcast news data show that the proposed framework is faster than the conventional segmentation and clustering-based approach while achieving comparable diarization accuracy. Moreover, the proposed framework obtains a higher speedup over the conventional approach on a larger test data set.


doi: 10.21437/Interspeech.2009-324

Cite as: Cheng, S.-S., Tseng, C.-H., Chen, C.-P., Wang, H.-M. (2009) Speaker diarization using divide-and-conquer. Proc. Interspeech 2009, 1055-1058, doi: 10.21437/Interspeech.2009-324

@inproceedings{cheng09c_interspeech,
  author={Shih-Sian Cheng and Chun-Han Tseng and Chia-Ping Chen and Hsin-Min Wang},
  title={{Speaker diarization using divide-and-conquer}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1055--1058},
  doi={10.21437/Interspeech.2009-324}
}