ISCA Archive ASVSPOOF 2021
ISCA Archive ASVSPOOF 2021

The Biometric Vox System for the ASVspoof 2021 Challenge

Joaquín Cáceres, Roberto Font, Teresa Grau, Javier Molina

This paper describes the systems developed by Biometric Vox for the ASVspoof 2021 challenge Logical Access (LA) and Physical Access (PA) tracks. The Logical Access track aims at detecting the use of speech synthesis or voice conversion techniques. In the case of the Physical Access track, the task is the detection of replayed speech. We experiment with different input features and neural network architectures. In particular, we propose a lightweight Time Delay Neural Network architecture and the use of Focal Loss as a way to handle class imbalance and emphasize hard-to-classify samples. Additionally, we explore the use of neural networks as embedding extractors and propose a one-class Gaussian classifier on top of these embeddings. Our final system for the PA track obtains min-tDCF=0.6658 and EER=24.44% on the progress set and min-tDCF=0.7462 and EER=29.00% on the evaluation set. On the LA track, our best system obtains min-tDCF=0.2371 and EER=4.54% on the progress set and min-tDCF=0.2747 and EER=5.58% on the evaluation set.


doi: 10.21437/ASVSPOOF.2021-11

Cite as: Cáceres, J., Font, R., Grau, T., Molina, J. (2021) The Biometric Vox System for the ASVspoof 2021 Challenge. Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, 68-74, doi: 10.21437/ASVSPOOF.2021-11

@inproceedings{caceres21_asvspoof,
  author={Joaquín Cáceres and Roberto Font and Teresa Grau and Javier Molina},
  title={{The Biometric Vox System for the ASVspoof 2021 Challenge}},
  year=2021,
  booktitle={Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge},
  pages={68--74},
  doi={10.21437/ASVSPOOF.2021-11}
}