Novel Subband Autoencoder Features for Detection of Spoofed Speech

Meet H. Soni, Tanvina B. Patel, Hemant A. Patil

Deep Neural Network (DNN) have been extensively used in Automatic Speech Recognition (ASR) applications. Very recently, DNNs have also found application in detecting natural vs. spoofed speech at ASV spoof challenge held at INTERSPEECH 2015. Along the similar lines, in this work, we propose a new feature extraction architecture of DNN called the subband autoencoder (SBAE) for spoof detection task. The SBAE is inspired by the human auditory system and extracts features from the speech spectrum in an unsupervised manner. The features derived from SBAE are compared with state-of-the-art Mel Frequency Cepstral Coefficient (MFCC) features. The experiments were performed on ASV spoof challenge database and the performance was evaluated using Equal Error Rate (EER). It was observed that on the evaluation set, MFCC features with 36-dimensional (static+Δ+ΔΔ) features gave 4.32% EER which reduced to 2.9% when 36-dimensional SBAE features were used. Further on fusing SBAE features at score-level with MFCC, a further reduction till 1.93% EER was observed. This improvement in EER was due to the fact that the dynamics of SBAE features captured significant spoof specific characteristics leading to detect significantly even vocoder-independent speech, which is not the case for MFCC.

DOI: 10.21437/Interspeech.2016-668

Cite as

Soni, M.H., Patel, T.B., Patil, H.A. (2016) Novel Subband Autoencoder Features for Detection of Spoofed Speech. Proc. Interspeech 2016, 1820-1824.

author={Meet H. Soni and Tanvina B. Patel and Hemant A. Patil},
title={Novel Subband Autoencoder Features for Detection of Spoofed Speech},
booktitle={Interspeech 2016},