Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data

Yao Tian, Meng Cai, Liang He, Wei-Qiang Zhang, Jia Liu


Recently, deep neural networks (DNNs) trained to predict senones have been incorporated into the conventional i-vector based speaker verification systems to provide soft frame alignments and show promising results. However, the data mismatch problem may degrade the performance since the DNN requires transcribed data (out-domain data) while the data sets (in-domain data) used for i-vector training and extraction are mostly untranscribed. In this paper, we try to address this problem by exploiting the unlabeled in-domain data during the training of the DNN, hoping the DNN can provide a more robust basis for the in-domain data. In this work, we first explore the impact of using in-domain data during the unsupervised DNN pre-training process. In addition, we decode the in-domain data using a hybrid DNN-HMM system to get its transcription, and then we retrain the DNN model with the “labeled” in-domain data. Experimental results on the NIST SRE 2008 and the NIST SRE 2010 databases demonstrate the effectiveness of the proposed methods.


DOI: 10.21437/Interspeech.2016-614

Cite as

Tian, Y., Cai, M., He, L., Zhang, W., Liu, J. (2016) Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data. Proc. Interspeech 2016, 1863-1867.

Bibtex
@inproceedings{Tian+2016,
author={Yao Tian and Meng Cai and Liang He and Wei-Qiang Zhang and Jia Liu},
title={Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-614},
url={http://dx.doi.org/10.21437/Interspeech.2016-614},
pages={1863--1867}
}