Cosine Metric Learning for Speaker Verification in the I-vector Space

Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen


It is known that the equal-error-rate (EER) performance of a speaker verification system is determined by the overlap region of the decision scores of true and imposter trials. Also, the cosine similarity scores of the true or imposter trials produced by the state-of-the-art i-vector front-end approximate to a Gaussian distribution and the overlap region of the two classes of trials depends mainly on their between-class distance. Motivated by the above facts, this paper presents a cosine similarity learning (CML) framework for speaker verification, which combines classical compensation techniques and the cosine similarity scoring for improving the EER performance. CML minimizes the overlap region by enlarging the between-class distance while introducing a regularization term to control the within-class variance, which is initialized by a traditional channel compensation technique such as linear discriminant analysis. Experiments are carried out to compare the proposed CML framework with several traditional channel compensation baselines on the NIST speaker recognition evaluation data sets. The results show that CML outperforms all the studied initialization compensation techniques.


 DOI: 10.21437/Interspeech.2018-1593

Cite as: Bai, Z., Zhang, X., Chen, J. (2018) Cosine Metric Learning for Speaker Verification in the I-vector Space. Proc. Interspeech 2018, 1126-1130, DOI: 10.21437/Interspeech.2018-1593.


@inproceedings{Bai2018,
  author={Zhongxin Bai and Xiao-Lei Zhang and Jingdong Chen},
  title={Cosine Metric Learning for Speaker Verification in the I-vector Space},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1126--1130},
  doi={10.21437/Interspeech.2018-1593},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1593}
}