An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification

Chunlei Zhang, Shivesh Ranjan, John Hansen


In this paper, we present transfer learning for deep neural network based text-independent speaker verification, in the presence of a severe mismatch between the enrollment and the test data. Given a pre-trained speaker embedding network developed with out-of domain data, we explore and analyze how this pre-trained model can benefit for the in-domain speaker verification task. Two alternative strategies are investigated to perform transfer learning, i.e., vanilla transfer learning (V-TL) and curriculum learning based transfer learning (CL-TL). The proposed methods are validated on UT-SCOPE-physical speech corpus, where we create a setup to introduce mismatched evaluation conditions with the neutral and the physical task stressed speech. Experimental results confirm the effectiveness of both V-TL and CL-TL techniques. Employing transfer learning based on the pre-trained model, we are able to achieve a +47.7% relative improvement over a conventional i-vector/PLDA system and a +30.6% relative improvement over a recent proposed end-to-end system, respectively.


 DOI: 10.21437/Odyssey.2018-26

Cite as: Zhang, C., Ranjan, S., Hansen, J. (2018) An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 181-186, DOI: 10.21437/Odyssey.2018-26.


@inproceedings{Zhang2018,
  author={Chunlei Zhang and Shivesh Ranjan and John Hansen},
  title={An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification	},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={181--186},
  doi={10.21437/Odyssey.2018-26},
  url={http://dx.doi.org/10.21437/Odyssey.2018-26}
}