15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Unsupervised Speaker Diarization Using Riemannian Manifold Clustering

Che-Wei Huang, Bo Xiao, Panayiotis G. Georgiou, Shrikanth S. Narayanan

University of Southern California, USA

We address the problem of speaker clustering for robust unsupervised speaker diarization. We model each speaker-homogeneous segment as one single full multivariate Gaussian probability density function (pdf) and take into consideration the Riemannian property of Gaussian pdfs. By assuming that segments from different speakers lie on different (possibly intersected) sub-manifolds of the manifold of Gaussian pdfs, we formulate the original problem as a Riemannian manifold clustering problem. To apply the computationally simple Riemannian locally linear embedding (LLE) algorithm, we impose a constraint on the length of each segment so as to ensure the fitness of single-Gaussian modeling and to increase the chance that all k-nearest neighbors of a pdf are from the same sub-manifold (speaker). Experiments on the microphone-recorded conversational interviews from NIST 2010 speaker recognition evaluation set demonstrate promising results of less than 1% DER.

Full Paper

Bibliographic reference.  Huang, Che-Wei / Xiao, Bo / Georgiou, Panayiotis G. / Narayanan, Shrikanth S. (2014): "Unsupervised speaker diarization using riemannian manifold clustering", In INTERSPEECH-2014, 567-571.