ResNet and Model Fusion for Automatic Spoofing Detection

Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu


Speaker verification systems have achieved great progress in recent years. Unfortunately, they are still highly prone to different kinds of spoofing attacks such as speech synthesis, voice conversion, and fake audio recordings etc. Inspired by the success of ResNet in image recognition, we investigated the effectiveness of using ResNet for automatic spoofing detection. Experimental results on the ASVspoof2017 data set show that ResNet performs the best among all the single-model systems. Model fusion is a good way to further improve the system performance. Nevertheless, we found that if the same feature is used for different fused models, the resulting system can hardly be improved. By using different features and models, our best fused model further reduced the Equal Error Rate (EER) by 18% relatively, compared with the best single-model system.


 DOI: 10.21437/Interspeech.2017-1085

Cite as: Chen, Z., Xie, Z., Zhang, W., Xu, X. (2017) ResNet and Model Fusion for Automatic Spoofing Detection. Proc. Interspeech 2017, 102-106, DOI: 10.21437/Interspeech.2017-1085.


@inproceedings{Chen2017,
  author={Zhuxin Chen and Zhifeng Xie and Weibin Zhang and Xiangmin Xu},
  title={ResNet and Model Fusion for Automatic Spoofing Detection},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={102--106},
  doi={10.21437/Interspeech.2017-1085},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1085}
}