Adaptive Mean Normalization for Unsupervised Adaptation of Speaker Embeddings

Mitchell Mclaren, Md Hafizur Rahman, Diego Castan, Mahesh Kumar Nandwana, Aaron Lawson


We propose an active learning approach for the unsupervised normalization of vector representations of speech, such as speaker embeddings, currently in widespread use for speaker recognition systems. We demonstrate that the traditionally used mean for normalization of speaker embeddings prior to probabilistic linear discriminant analysis (PLDA) is suboptimal when the evaluation conditions do not match the training conditions. Using an unlabeled sample of target-domain data, we show that the proposed adaptive mean normalization (AMN) technique is extremely effective for improving discrimination and calibration performance, by up to 26% and 65% relative over out-of-the-box system performance. These benchmarks were performed on four distinctly different datasets for a thorough analysis of AMN robustness. Most notably, for a range of data conditions, AMN enabled the use of a calibration model trained on data mismatched to the conditions being evaluated. The approach was found to be effective when using as few as thirty-two unlabeled samples of target-domain data.


 DOI: 10.21437/Odyssey.2020-13

Cite as: Mclaren, M., Rahman, M.H., Castan, D., Nandwana, M.K., Lawson, A. (2020) Adaptive Mean Normalization for Unsupervised Adaptation of Speaker Embeddings. Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 88-94, DOI: 10.21437/Odyssey.2020-13.


@inproceedings{Mclaren2020,
  author={Mitchell Mclaren and Md Hafizur Rahman and Diego Castan and Mahesh Kumar Nandwana and Aaron Lawson},
  title={{Adaptive Mean Normalization for Unsupervised Adaptation of Speaker Embeddings}},
  year=2020,
  booktitle={Proc. Odyssey 2020 The Speaker and Language Recognition Workshop},
  pages={88--94},
  doi={10.21437/Odyssey.2020-13},
  url={http://dx.doi.org/10.21437/Odyssey.2020-13}
}