Recently, the integration of deep neural networks (DNNs) with i-vector systems is proved to be effective for speaker verification. This method uses the DNN with senone outputs to produce frame alignments for sufficient statistics extraction. However, two types of data mismatch may degrade the performance of the DNN-based speaker verification systems. First, the DNN requires transcribed training data, while the data sets used for i-vector training and extraction are mostly untranscribed. Second, the language of the training data for DNN is limited by the pronunciation lexicon, making the model unsuitable for multilingual tasks. In this paper, we propose to use bottleneck features and multilingual DNNs to narrow the gap caused by the data mismatch. In our method, a DNN is first trained with senone labels to extract bottleneck features. Then a Gaussian mixture model (GMM) is trained with the bottleneck features to produce frame alignments. Additionally, bottleneck features based on multilingual DNNs are explored for multilingual speaker verification. Experiments on the NIST SRE 2008 female short2-short3 telephone task (multilingual) and the NIST SRE 2010 female core-extended telephone task (English) demonstrate the effectiveness of the proposed method.
Bibliographic reference. Tian, Yao / Cai, Meng / He, Liang / Liu, Jia (2015): "Investigation of bottleneck features and multilingual deep neural networks for speaker verification", In INTERSPEECH-2015, 1151-1155.