We address the problem of speaker clustering for robust unsupervised speaker diarization. We model each speaker-homogeneous segment as one single full multivariate Gaussian probability density function (pdf) and take into consideration the Riemannian property of Gaussian pdfs. By assuming that segments from different speakers lie on different (possibly intersected) sub-manifolds of the manifold of Gaussian pdfs, we formulate the original problem as a Riemannian manifold clustering problem. To apply the computationally simple Riemannian locally linear embedding (LLE) algorithm, we impose a constraint on the length of each segment so as to ensure the fitness of single-Gaussian modeling and to increase the chance that all k-nearest neighbors of a pdf are from the same sub-manifold (speaker). Experiments on the microphone-recorded conversational interviews from NIST 2010 speaker recognition evaluation set demonstrate promising results of less than 1% DER.
Bibliographic reference. Huang, Che-Wei / Xiao, Bo / Georgiou, Panayiotis G. / Narayanan, Shrikanth S. (2014): "Unsupervised speaker diarization using riemannian manifold clustering", In INTERSPEECH-2014, 567-571.