The Opensesame NIST 2016 Speaker Recognition Evaluation System

Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao

Last two decades have witnessed a significant progress in speaker recognition, as evidenced by the improving performance in the speaker recognition evaluations (SRE) hosted by NIST. Despite the progress, only a few research is focused on speaker recognition with short duration and language mismatch condition, which often leads to poor recognition performance. In NIST SRE2016, these concerns were first systematically investigated by the speaker recognition community. In this study, we address these challenges from the viewpoint of feature extraction and modeling. In particular, we improve the robustness of features by combining GMM and DNN based iVector extraction approaches, and improve the reliability of the back-end model by exploiting symmetric SVM that can effectively leverage the unlabeled data. Finally, we introduce distance metric learning to improve the generalization capacity of the development data that is usually in limited size. Then a fusion strategy is adopted to collectively boost the performance. The effectiveness of the proposed scheme for speaker recognition is demonstrated on SRE2016 evaluation data: compared with DNN-iVector PLDA baseline system, our method yields 25.6% relative improvement in terms of min_Cprimary.

 DOI: 10.21437/Interspeech.2017-997

Cite as: Liu, G., Qian, Q., Wang, Z., Zhao, Q., Wang, T., Li, H., Xue, J., Zhu, S., Jin, R., Zhao, T. (2017) The Opensesame NIST 2016 Speaker Recognition Evaluation System. Proc. Interspeech 2017, 2854-2858, DOI: 10.21437/Interspeech.2017-997.

  author={Gang Liu and Qi Qian and Zhibin Wang and Qingen Zhao and Tianzhou Wang and Hao Li and Jian Xue and Shenghuo Zhu and Rong Jin and Tuo Zhao},
  title={The Opensesame NIST 2016 Speaker Recognition Evaluation System},
  booktitle={Proc. Interspeech 2017},