ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks

Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak


We present JHU’s system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT). Anti-spoofing has gathered more and more attention since the inauguration of the ASVspoof Challenges, and ASVspoof 2019 dedicates to address attacks from all three major types: text-to-speech, voice conversion, and replay. Built upon previous research work on Deep Neural Network (DNN), ASSERT is a pipeline for DNN-based approach to anti-spoofing. ASSERT has four components: feature engineering, DNN models, network optimization and system combination, where the DNN models are variants of squeeze-excitation and residual networks. We conducted an ablation study of the effectiveness of each component on the ASVspoof 2019 corpus, and experimental results showed that ASSERT obtained more than 93% and 17% relative improvements over the baseline systems in the two sub-challenges in ASVspoof 2019, ranking ASSERT one of the top performing systems. Code and pretrained models are made publicly available.


 DOI: 10.21437/Interspeech.2019-1794

Cite as: Lai, C., Chen, N., Villalba, J., Dehak, N. (2019) ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks. Proc. Interspeech 2019, 1013-1017, DOI: 10.21437/Interspeech.2019-1794.


@inproceedings{Lai2019,
  author={Cheng-I Lai and Nanxin Chen and Jesús Villalba and Najim Dehak},
  title={{ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1013--1017},
  doi={10.21437/Interspeech.2019-1794},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1794}
}