Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels

Sungrack Yun, Hye Jin Jang, Taesu Kim


This paper presents a speaker clustering framework by iteratively performing two stages: a discriminative feature space is obtained given a cluster label set, and the cluster label set is updated using a clustering algorithm given the feature space. In the iterations of two stages, the cluster labels may be different from the true labels, and thus the obtained feature space based on the labels may be inaccurately discriminated. However, by iteratively performing above two stages, more accurate cluster labels and more discriminative feature space can be obtained, and finally they are converged. In this research, the linear discriminant analysis is used for discriminating the i-vector feature space, and the variational Bayesian expectation-maximization on Gaussian mixture model is used for clustering the i-vectors. Our iterative clustering framework was evaluated using the database of keyword utterances and compared with the recently-published approaches. In all experiments, the results show that our framework outperforms the other approaches and converges in a few iterations.


 DOI: 10.21437/Interspeech.2017-923

Cite as: Yun, S., Jang, H.J., Kim, T. (2017) Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels. Proc. Interspeech 2017, 2824-2828, DOI: 10.21437/Interspeech.2017-923.


@inproceedings{Yun2017,
  author={Sungrack Yun and Hye Jin Jang and Taesu Kim},
  title={Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2824--2828},
  doi={10.21437/Interspeech.2017-923},
  url={http://dx.doi.org/10.21437/Interspeech.2017-923}
}