Decision-level Feature Switching as a Paradigm for Replay Attack Detection

Saranya M S, Hema Murthy

A pre-recorded audio sample of an authentic speaker presented to a voice-based biometric system is termed as a replay attack. Such attacks can be detected by identifying the characteristics of the recording device and environment. An analysis of different recording devices indicates that each recording device affects the spectrum differently. It is also observed that each feature captures specific characteristics of recording devices. In particular, Mel Filterbank Slope (MFS) captures low-frequency information corresponding to that of the low-quality recording devices, while Linear Filterbank Slope (LFS) captures high-frequency information which corresponds to that of a high-quality recording device. The proposed approach uses MFS and LFS along with Mel Frequency Cepstral Coefficients (MFCC) and Constant-Q Cepstral Coefficients (CQCC) in a Decision-level Feature Switching (DLFS) paradigm to determine whether a given utterance is spoofed. The obtained results surpass the state-of-the-art Light Convolutional Neural Network (LCNN) based replay detection system with a relative improvement of 7.43% on the ASV-spoof-2017 evaluation dataset.

 DOI: 10.21437/Interspeech.2018-1494

Cite as: M S, S., Murthy, H. (2018) Decision-level Feature Switching as a Paradigm for Replay Attack Detection. Proc. Interspeech 2018, 686-690, DOI: 10.21437/Interspeech.2018-1494.

@inproceedings{M S2018,
  author={Saranya {M S} and Hema Murthy},
  title={Decision-level Feature Switching as a Paradigm for Replay Attack Detection},
  booktitle={Proc. Interspeech 2018},