Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection

Hardik Sailor, Madhu Kamble, Hemant Patil


In this paper, we present a standalone replay spoof speech detection (SSD) system to classify the natural vs. replay speech. The replay speech spectrum is known to be affected in the higher frequency range. In this context, we propose to exploit an auditory filterbank learning using Convolutional Restricted Boltzmann Machine (ConvRBM) with the pre-emphasized speech signals. Temporal modulations in amplitude (AM) and frequency (FM) are extracted from the ConvRBM subbands using the Energy Separation Algorithm (ESA). ConvRBM-based short-time AM and FM features are developed using cepstral processing, denoted as AM-ConvRBM-CC and FM-ConvRBM-CC. Proposed temporal modulation features performed better than the baseline Constant-Q Cepstral Coefficients (CQCC) features. On the evaluation set, an absolute reduction of 7.48% and 5.28% in Equal Error Rate (EER) is obtained using AM-ConvRBM-CC and FM-ConvRBM-CC, respectively compared to our CQCC baseline. The best results are achieved by combining scores from AM and FM cues (0.82% and 8.89% EER for development and evaluation set, respectively). The statistics of AM-FM features are analyzed to understand the performance gap and complementary information in both the features.


 DOI: 10.21437/Interspeech.2018-1651

Cite as: Sailor, H., Kamble, M., Patil, H. (2018) Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection. Proc. Interspeech 2018, 666-670, DOI: 10.21437/Interspeech.2018-1651.


@inproceedings{Sailor2018,
  author={Hardik Sailor and Madhu Kamble and Hemant Patil},
  title={Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={666--670},
  doi={10.21437/Interspeech.2018-1651},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1651}
}