Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019

Jonathan Huang, Tobias Bocklet


This paper describes Intel’s speaker recognition systems for the VOiCES from a Distance Challenge 2019. Our submission consists of a Resnet50, and four Xvector systems trained with different data augmentation and input features. Our novel contributions include the use of additive margin softmax loss function and the use of invariant representation learning for some of our systems. To our knowledge, this has not been proposed for speaker recognition. We found that such complementary subsystems greatly improved the performance on the development set by late fusion on score level based on linear logistic regression. After fusion our system achieved on the development set EER, minDCF and actDCF of 2.2%, 0.27 and 0.27; and on the evaluation set 6.08%, 0.451 and 0.458, respectively. We discuss our results and give some insight on accuracy with respect to recording distance.


 DOI: 10.21437/Interspeech.2019-2894

Cite as: Huang, J., Bocklet, T. (2019) Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019. Proc. Interspeech 2019, 2473-2477, DOI: 10.21437/Interspeech.2019-2894.


@inproceedings{Huang2019,
  author={Jonathan Huang and Tobias Bocklet},
  title={{Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2473--2477},
  doi={10.21437/Interspeech.2019-2894},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2894}
}