ISCA Archive Odyssey 2012
ISCA Archive Odyssey 2012

First attempt of boltzmann machines for speaker verification

Mohammed Senoussaoui, Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel

Frequently organized by NIST, Speaker Recognition evaluations (SRE) show high accuracy rates. This demonstrates that this field of research is mature. The latest progresses came from the proposition of low dimensional i-vectors representation and new classifiers such as Probabilistic Linear Discriminant Analysis (PLDA) or Cosine Distance classifier. In this paper, we study some variants of Boltzmann Machines (BM). BM is used in image processing but still unexplored in Speaker Verification (SR). Given two utterances, the SR task consists to decide whether they come from the same speaker or not. Based on this definition, we can illustrate SR as two-classes (same vs. different speakers classes) classification problem. Our first attempt of using BM is to model each class with one generative Restricted Boltzmann Machine (RBM) with symmetric Log-Likelihood Ratio on both models as decision score. This new approach achieved an Equal Error Rate (EER) of 7% and a minimum Detection Cost Function (DCF) of 0.035 on the female content of the NIST SRE 2008. The objective of this research is mainly to explore a new paradigm i.e. BM without necessarily obtaining better performance than the state-of-the-art system.

Cite as: Senoussaoui, M., Dehak, N., Kenny, P., Dehak, R., Dumouchel, P. (2012) First attempt of boltzmann machines for speaker verification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2012), 117-121

  author={Mohammed Senoussaoui and Najim Dehak and Patrick Kenny and Réda Dehak and Pierre Dumouchel},
  title={{First attempt of boltzmann machines for speaker verification}},
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2012)},