Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems

Victoria Mingote, Antonio Miguel, Dayana Ribas, Alfonso Ortega, Eduardo Lleida


Currently, most Speaker Verification (SV) systems based on neural networks use Cross-Entropy and/or Triplet loss functions. Despite these functions provide competitive results, they might not fully exploit the system performance, because they are not designed to optimize the verification task considering the performance measures, e.g. the Detection Cost Function (DCF) or the Equal Error Rate (EER). This paper proposes a first approach to this issue through the optimization of a loss function based on the DCF. This mechanism allows the end-to-end system to directly manage the threshold used to compute the ratio between the False Rejection Rate (FRR) and the False Acceptance Rate (FAR). This way connecting the system training directly to the operating point. Results in a text-dependent speaker verification framework, based on neural network super-vectors over the RSR2015 dataset, outperform reference systems using Cross-Entropy and Triplet loss, as well as our previously proposal based on an approximation of the Area Under the Curve ( aAUC).


 DOI: 10.21437/Interspeech.2019-2550

Cite as: Mingote, V., Miguel, A., Ribas, D., Ortega, A., Lleida, E. (2019) Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems. Proc. Interspeech 2019, 2903-2907, DOI: 10.21437/Interspeech.2019-2550.


@inproceedings{Mingote2019,
  author={Victoria Mingote and Antonio Miguel and Dayana Ribas and Alfonso Ortega and Eduardo Lleida},
  title={{Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2903--2907},
  doi={10.21437/Interspeech.2019-2550},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2550}
}