Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection

Hardik B. Sailor, Madhu R. Kamble, Hemant A. Patil


Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we use our recently proposed unsupervised filterbank learning technique using Convolutional Restricted Boltzmann Machine (ConvRBM) as a front-end feature representation. ConvRBM is trained on training subset of ASV spoof 2015 challenge database. Analyzing the filterbank trained on this dataset shows that ConvRBM learned more low-frequency subband filters compared to training on natural speech database such as TIMIT. The spoofing detection experiments were performed using Gaussian Mixture Models (GMM) as a back-end classifier. ConvRBM-based cepstral coefficients (ConvRBM-CC) perform better than hand crafted Mel Frequency Cepstral Coefficients (MFCC). On the evaluation set, ConvRBM-CC features give an absolute reduction of 4.76% in Equal Error Rate (EER) compared to MFCC features. Specifically, ConvRBM-CC features significantly perform better in both known attacks (1.93%) and unknown attacks (5.87%) compared to MFCC features.


 DOI: 10.21437/Interspeech.2017-1393

Cite as: Sailor, H.B., Kamble, M.R., Patil, H.A. (2017) Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection. Proc. Interspeech 2017, 2601-2605, DOI: 10.21437/Interspeech.2017-1393.


@inproceedings{Sailor2017,
  author={Hardik B. Sailor and Madhu R. Kamble and Hemant A. Patil},
  title={Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2601--2605},
  doi={10.21437/Interspeech.2017-1393},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1393}
}