Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams

Harishchandra Dubey, Abhijeet Sangwan, John H.L. Hansen


Speaker Diarization (i.e. determining who spoke and when?) for multi-speaker naturalistic interactions such as Peer-Led Team Learning (PLTL) sessions is a challenging task. In this study, we propose robust speaker clustering based on mixture of multivariate von Mises-Fisher distributions. Our diarization pipeline has two stages: (i) ground-truth segmentation; (ii) proposed speaker clustering. The ground-truth speech activity information is used for extracting i-Vectors from each speechsegment. We post-process the i-Vectors with principal component analysis for dimension reduction followed by lengthnormalization. Normalized i-Vectors are high-dimensional unit vectors possessing discriminative directional characteristics. We model the normalized i-Vectors with a mixture model consisting of multivariate von Mises-Fisher distributions. K-means clustering with cosine distance is chosen as baseline approach. The evaluation data is derived from: (i) CRSS-PLTL corpus; and (ii) three-meetings subset of AMI corpus. The CRSSPLTL data contain audio recordings of PLTL sessions which is student-led STEM education paradigm. Proposed approach is consistently better than baseline leading to upto 44.48% and 53.68% relative improvements for PLTL and AMI corpus, respectively.


 DOI: 10.21437/Interspeech.2018-50

Cite as: Dubey, H., Sangwan, A., Hansen, J.H. (2018) Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams. Proc. Interspeech 2018, 3603-3607, DOI: 10.21437/Interspeech.2018-50.


@inproceedings{Dubey2018,
  author={Harishchandra Dubey and Abhijeet Sangwan and John H.L. Hansen},
  title={Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3603--3607},
  doi={10.21437/Interspeech.2018-50},
  url={http://dx.doi.org/10.21437/Interspeech.2018-50}
}