Recently, Bayesian probabilistic model based clustering gets superior performance in speaker diarization, however, it is much more complicated than widely used efficient clustering algorithms, which is not convenient for some real-life scenarios. In this paper, we propose a covariance-asymptotic variant to Dirichlet process mixture models (DPMM), named Dirichlet process means (DP-means) clustering for speaker diarization. Similar to Bayesian nonparametric models (e.g. DPMM), DP-means can constantly generate new clusters during clustering, which is suitable to the speaker diarization problem where the number of speakers is determined on-the-fly. Different from Bayesian nonparametric models, DP-means is a hard clustering that does not need to optimize the variance of mixtures, which is efficient for real-world problems. We further exploited an initialization method to obtain the prior cluster centroids for DP-means. Experimental results on the CALLHOME, AMI and DIHARD III corpora show that the proposed method is more efficient than the state-of-the-art speaker clustering methods with slight performance degradation.
Cite as: Gong, Y., Zhang, X.-L. (2022) DP-Means: An Efficient Bayesian Nonparametric Model for Speaker Diarization. Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 156-161, doi: 10.21437/Odyssey.2022-22
@inproceedings{gong22_odyssey, author={Yijun Gong and Xiao-Lei Zhang}, title={{DP-Means: An Efficient Bayesian Nonparametric Model for Speaker Diarization}}, year=2022, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2022)}, pages={156--161}, doi={10.21437/Odyssey.2022-22} }