Deep Learning for i-Vector Speaker and Language Recognition: A Ph.D. Thesis Overview

Omid Ghahabi


Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily accessible in practice. On the other hand, the lack of speaker-labeled background data makes a big performance gap, in speaker recognition, between two well-known cosine and PLDA i-vector scoring techniques. This thesis tries to solve the problems above by using the DL technology in different ways, without any need of speaker or phonetic labels. We have proposed an effective DL-based backend for i-vectors which fills 46% of this performance gap, in terms of minDCF, and 79% in combination with a PLDA system with automatically estimated labels. We have also developed an efficient alternative vector representation of speech by keeping the computational cost as low as possible and avoiding phonetic labels. The proposed vectors are referred to as GMM-RBM vectors. Experiments on the core test condition 5 of the NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process. Finally, for the LID application, we have proposed a DNN architecture to model effectively the i-vector space of languages in the car environment. It is shown that the proposed DNN architecture outperforms GMM-UBM and i-vector/LDA systems by 37% and 28%, respectively, for short signals 2-3 sec.


 DOI: 10.21437/IberSPEECH.2018-37

Cite as: Ghahabi, O. (2018) Deep Learning for i-Vector Speaker and Language Recognition: A Ph.D. Thesis Overview. Proc. IberSPEECH 2018, 184-188, DOI: 10.21437/IberSPEECH.2018-37.


@inproceedings{Ghahabi2018,
  author={Omid Ghahabi},
  title={{Deep Learning for i-Vector Speaker and Language Recognition: A Ph.D. Thesis Overview}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={184--188},
  doi={10.21437/IberSPEECH.2018-37},
  url={http://dx.doi.org/10.21437/IberSPEECH.2018-37}
}