ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data

Diego Castan, Mitchell McLaren, Luciana Ferrer, Aaron Lawson, Alicia Lozano-Diez

Unsupervised techniques for the adaptation of speaker recognition are important due to the problem of condition mismatch that is prevalent when applying speaker recognition technology to new conditions and the general scarcity of labeled ‘in-domain’ data. In the recent NIST 2016 Speaker Recognition Evaluation (SRE), symmetric score normalization (S-norm) and calibration using unlabeled in-domain data were shown to be beneficial. Because calibration requires speaker labels for training, speaker-clustering techniques were used to generate pseudo-speakers for learning calibration parameters in those cases where only unlabeled in-domain data was available. These methods performed well in the SRE16. It is unclear, however, whether those techniques generalize well to other data sources. In this work, we benchmark these approaches on several distinctly different databases, after we describe our SRI-CON-UAM team system submission for the NIST 2016 SRE. Our analysis shows that while the benefit of S-norm is also observed across other datasets, applying speaker-clustered calibration provides considerably greater benefit to the system in the context of new acoustic conditions.


doi: 10.21437/Interspeech.2017-605

Cite as: Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A. (2017) Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data. Proc. Interspeech 2017, 3737-3741, doi: 10.21437/Interspeech.2017-605

@inproceedings{castan17_interspeech,
  author={Diego Castan and Mitchell McLaren and Luciana Ferrer and Aaron Lawson and Alicia Lozano-Diez},
  title={{Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3737--3741},
  doi={10.21437/Interspeech.2017-605}
}