ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection

Hardik B. Sailor, Madhu R. Kamble, Hemant A. Patil

Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we use our recently proposed unsupervised filterbank learning technique using Convolutional Restricted Boltzmann Machine (ConvRBM) as a front-end feature representation. ConvRBM is trained on training subset of ASV spoof 2015 challenge database. Analyzing the filterbank trained on this dataset shows that ConvRBM learned more low-frequency subband filters compared to training on natural speech database such as TIMIT. The spoofing detection experiments were performed using Gaussian Mixture Models (GMM) as a back-end classifier. ConvRBM-based cepstral coefficients (ConvRBM-CC) perform better than hand crafted Mel Frequency Cepstral Coefficients (MFCC). On the evaluation set, ConvRBM-CC features give an absolute reduction of 4.76% in Equal Error Rate (EER) compared to MFCC features. Specifically, ConvRBM-CC features significantly perform better in both known attacks (1.93%) and unknown attacks (5.87%) compared to MFCC features.


doi: 10.21437/Interspeech.2017-1393

Cite as: Sailor, H.B., Kamble, M.R., Patil, H.A. (2017) Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection. Proc. Interspeech 2017, 2601-2605, doi: 10.21437/Interspeech.2017-1393

@inproceedings{sailor17_interspeech,
  author={Hardik B. Sailor and Madhu R. Kamble and Hemant A. Patil},
  title={{Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2601--2605},
  doi={10.21437/Interspeech.2017-1393}
}