Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we use our recently proposed unsupervised filterbank learning technique using Convolutional Restricted Boltzmann Machine (ConvRBM) as a front-end feature representation. ConvRBM is trained on training subset of ASV spoof 2015 challenge database. Analyzing the filterbank trained on this dataset shows that ConvRBM learned more low-frequency subband filters compared to training on natural speech database such as TIMIT. The spoofing detection experiments were performed using Gaussian Mixture Models (GMM) as a back-end classifier. ConvRBM-based cepstral coefficients (ConvRBM-CC) perform better than hand crafted Mel Frequency Cepstral Coefficients (MFCC). On the evaluation set, ConvRBM-CC features give an absolute reduction of 4.76% in Equal Error Rate (EER) compared to MFCC features. Specifically, ConvRBM-CC features significantly perform better in both known attacks (1.93%) and unknown attacks (5.87%) compared to MFCC features.
Cite as: Sailor, H.B., Kamble, M.R., Patil, H.A. (2017) Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection. Proc. Interspeech 2017, 2601-2605, doi: 10.21437/Interspeech.2017-1393
@inproceedings{sailor17_interspeech, author={Hardik B. Sailor and Madhu R. Kamble and Hemant A. Patil}, title={{Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2601--2605}, doi={10.21437/Interspeech.2017-1393} }