Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems

Mahesh Kumar Nandwana, Luciana Ferrer, Mitchell McLaren, Diego Castan, Aaron Lawson


In this paper, we analyze and assess the impact of critical metadata factors on the calibration performance of speaker recognition systems. In particular, we study the effect of duration, distance, language, and gender by using a variety of datasets and systematically varying the conditions in the evaluation and calibration sets. For all experiments, the system is based on i-vectors and a probabilistic linear discriminant analysis (PLDA) back-end and linear calibration. We measure system performance in terms of calibration loss. Our experiments reveal (i) a large degradation when the duration used for calibration is significantly different from that in the evaluation set; (ii) no significant degradation when a different gender is used for calibration than for evaluation; (iii) a large degradation when microphone distance is significantly different between the sets; and (iv) a small loss for closely related languages and languages with shared vocabulary. This analysis will be beneficial in the development of speaker recognition systems for use in unseen environments and for forensic speaker recognition analysts when selecting relevant population data.


 DOI: 10.21437/Interspeech.2019-1808

Cite as: Nandwana, M.K., Ferrer, L., McLaren, M., Castan, D., Lawson, A. (2019) Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems. Proc. Interspeech 2019, 4325-4329, DOI: 10.21437/Interspeech.2019-1808.


@inproceedings{Nandwana2019,
  author={Mahesh Kumar Nandwana and Luciana Ferrer and Mitchell McLaren and Diego Castan and Aaron Lawson},
  title={{Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4325--4329},
  doi={10.21437/Interspeech.2019-1808},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1808}
}