Restricted Boltzmann Machine Vectors for Speaker Clustering

Umair Khan, Pooyan Safari, Javier Hernando


Restricted Boltzmann Machines (RBMs) have been used both in the front-end and backend of speaker verification systems. In this work, we apply RBMs as a front-end in the context of speaker clustering. Speakers' utterances are transformed into a vector representation by means of RBMs. These vectors, referred to as RBM vectors, have shown to preserve speaker-specific information and are used for the task of speaker clustering. In this work, we perform the traditional bottom-up Agglomerative Hierarchical Clustering (AHC). Using the RBM vector representation of speakers, the performance of speaker clustering is improved. The evaluation has been performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed system outperforms the baseline i-vectors system in terms of Equal Impurity (EI). Using cosine scoring, a relative improvement of 11% and 12% are achieved for average and single linkage clustering algorithms respectively. Using PLDA scoring, the RBM vectors achieve a relative improvement of 11% compared to i-vectors for the single linkage algorithm.


 DOI: 10.21437/IberSPEECH.2018-3

Cite as: Khan, U., Safari, P., Hernando, J. (2018) Restricted Boltzmann Machine Vectors for Speaker Clustering. Proc. IberSPEECH 2018, 10-14, DOI: 10.21437/IberSPEECH.2018-3.


@inproceedings{Khan2018,
  author={Umair Khan and Pooyan Safari and Javier Hernando},
  title={{Restricted Boltzmann Machine Vectors for Speaker Clustering}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={10--14},
  doi={10.21437/IberSPEECH.2018-3},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-3}
}