The main challenges introduced in the 2016 NIST speaker recognition evaluation (SRE16) are domain mismatch between training and evaluation data, duration variability in test recordings and unlabeled in-domain training data. This paper outlines the systems developed at CRIM for SRE16. To tackle the domain mismatch problem, we apply minimum divergence training to adapt a conventional i-vector extractor to the task domain. Specifically, we take an out-of-domain trained i-vector extractor as an initialization and perform few iterations of minimum divergence training on the unlabeled data provided. Next, we non-linearly transform the adapted i-vectors by learning a speaker classifier neural network. Speaker features extracted from this network have been shown to be more robust than i-vectors under domain mismatch conditions with a reduction in equal error rates of 2–3% absolute. Finally, we propose a new Beta-Bernoulli backend that models the features supplied by the speaker classifier network. Our best single system is the speaker classifier network - Beta-Bernoulli backend combination. Overall system performance was very satisfactory for the fixed condition task. With our submitted fused system we achieve an equal error rate of 9.89%.
Cite as: Alam, J., Kenny, P., Bhattacharya, G., Kockmann, M. (2017) Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks. Proc. Interspeech 2017, 3732-3736, doi: 10.21437/Interspeech.2017-1240
@inproceedings{alam17b_interspeech, author={Jahangir Alam and Patrick Kenny and Gautam Bhattacharya and Marcel Kockmann}, title={{Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3732--3736}, doi={10.21437/Interspeech.2017-1240} }