MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition

Daniel Garcia-Romero, Greg Sell, Alan Mccree


We present a magnitude estimation network that is combined with a modified ResNet x-vector system to generate embeddings whose inner product is able to produce calibrated scores with increased discrimination. A three-step training procedure is used. First, the network is trained using short segments and a multi-class cross-entropy loss with angular margin softmax. During the second step, only a reduced subset of the DNN parameters are refined using full-length recordings. Finally, the magnitude estimation network is trained using a binary cross-entropy loss over pairs of target and non-target trials. The resulting system is evaluated on 4 widely-used benchmarks and provides significant discrimination and calibration gains at multiple operating points.


 DOI: 10.21437/Odyssey.2020-1

Cite as: Garcia-Romero, D., Sell, G., Mccree, A. (2020) MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition. Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 1-8, DOI: 10.21437/Odyssey.2020-1.


@inproceedings{Garcia-Romero2020,
  author={Daniel Garcia-Romero and Greg Sell and Alan Mccree},
  title={{MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition}},
  year=2020,
  booktitle={Proc. Odyssey 2020 The Speaker and Language Recognition Workshop},
  pages={1--8},
  doi={10.21437/Odyssey.2020-1},
  url={http://dx.doi.org/10.21437/Odyssey.2020-1}
}