Speaker Verification in Mismatched Conditions with Frustratingly Easy Domain Adaptation

Md Jahangir Alam, Gautam Bhattacharya, Patrick Kenny


The 2016 edition of the NIST speaker recognition evaluation tests the ability of speaker verification systems to deal with language mismatch between development and test data. In order to adapt to new languages, a small amount of unlabeled, in-domain data was provided - warranting the need for an unsupervised approach to learn from this data. In this work we adapt a simple domain adaptation strategy to the speaker verification problem. We test our approach using two types of speaker embeddings - i-vectors and neural network based x-vectors. Despite the simplicity of our method, we show that it outperforms a competitive PLDA domain-adaptation approach in the i-vector domain (12.11% vs 12.68% EER), and works as well in the x-vector domain (8.93% vs 8.91% EER). Finally, as our approach adapts the speaker embeddings, we combined our adapted embeddings with the PLDA adaptation approach. We achieved our best result (8.75% EER) using this strategy with x-vectors.


 DOI: 10.21437/Odyssey.2018-25

Cite as: Alam, M.J., Bhattacharya, G., Kenny, P. (2018) Speaker Verification in Mismatched Conditions with Frustratingly Easy Domain Adaptation . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 176-180, DOI: 10.21437/Odyssey.2018-25.


@inproceedings{Alam2018,
  author={Md Jahangir Alam and Gautam Bhattacharya and Patrick Kenny},
  title={Speaker Verification in Mismatched Conditions with Frustratingly Easy Domain Adaptation	},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={176--180},
  doi={10.21437/Odyssey.2018-25},
  url={http://dx.doi.org/10.21437/Odyssey.2018-25}
}