Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features

Sarfaraz Jelil, Rohan Kumar Das, S.R. Mahadeva Prasanna, Rohit Sinha

This work describes the techniques used for spoofed speech detection for the ASVspoof 2017 challenge. The main focus of this work is on exploiting the differences in the speech-specific nature of genuine speech signals and spoofed speech signals generated by replay attacks. This is achieved using glottal closure instants, epoch strength, and the peak to side lobe ratio of the Hilbert envelope of linear prediction residual. Apart from these source features, the instantaneous frequency cosine coefficient feature, and two cepstral features namely, constant Q cepstral coefficients and mel frequency cepstral coefficients are used. A combination of all these features is performed to obtain a high degree of accuracy for spoof detection. Initially, efficacy of these features are tested on the development set of the ASVspoof 2017 database with Gaussian mixture model based systems. The systems are then fused at score level which acts as the final combined system for the challenge. The combined system is able to outperform the individual systems by a significant margin. Finally, the experiments are repeated on the evaluation set of the database and the combined system results in an equal error rate of 13.95%.

 DOI: 10.21437/Interspeech.2017-930

Cite as: Jelil, S., Das, R.K., Prasanna, S.M., Sinha, R. (2017) Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features. Proc. Interspeech 2017, 22-26, DOI: 10.21437/Interspeech.2017-930.

  author={Sarfaraz Jelil and Rohan Kumar Das and S.R. Mahadeva Prasanna and Rohit Sinha},
  title={Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features},
  booktitle={Proc. Interspeech 2017},