Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks

Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Marcel Kockmann


The main challenges introduced in the 2016 NIST speaker recognition evaluation (SRE16) are domain mismatch between training and evaluation data, duration variability in test recordings and unlabeled in-domain training data. This paper outlines the systems developed at CRIM for SRE16. To tackle the domain mismatch problem, we apply minimum divergence training to adapt a conventional i-vector extractor to the task domain. Specifically, we take an out-of-domain trained i-vector extractor as an initialization and perform few iterations of minimum divergence training on the unlabeled data provided. Next, we non-linearly transform the adapted i-vectors by learning a speaker classifier neural network. Speaker features extracted from this network have been shown to be more robust than i-vectors under domain mismatch conditions with a reduction in equal error rates of 2–3% absolute. Finally, we propose a new Beta-Bernoulli backend that models the features supplied by the speaker classifier network. Our best single system is the speaker classifier network - Beta-Bernoulli backend combination. Overall system performance was very satisfactory for the fixed condition task. With our submitted fused system we achieve an equal error rate of 9.89%.


 DOI: 10.21437/Interspeech.2017-1240

Cite as: Alam, J., Kenny, P., Bhattacharya, G., Kockmann, M. (2017) Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks. Proc. Interspeech 2017, 3732-3736, DOI: 10.21437/Interspeech.2017-1240.


@inproceedings{Alam2017,
  author={Jahangir Alam and Patrick Kenny and Gautam Bhattacharya and Marcel Kockmann},
  title={Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3732--3736},
  doi={10.21437/Interspeech.2017-1240},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1240}
}