The detection of human and spoofed (synthetic/converted) speech has started to receive more attention. In this study, relative phase information extracted from a Fourier spectrum is used to detect human and spoofed speech. Because original/ natural phase information is almost entirely lost in spoofed speech using current synthesis/conversion techniques, a modified group delay based feature, the frequency derivative of the phase spectrum, has been shown effective for detecting human speech and spoofed speech. The modified group delay based phase contains both the magnitude spectrum and phase information. Therefore, the relative phase information, which contains only phase information, is expected to achieve a better spoofing detection performance. In this study, the relative phase information is also combined with the Mel-Frequency Cepstral Coefficient (MFCC) and modified group delay. The proposed method was evaluated using the “ASVspoof 2015: Automatic Speaker Verification Spoofing and Countermeasures Challenge” dataset. The results show that the proposed relative phase information significantly outperforms the MFCC and modified group delay. The equal error rate (EER) was reduced from 1.74% of MFCC, 0.83% of modified group delay to 0.013% of relative phase. By combining the relative phase with MFCC and modified group delay, the EER was reduced to 0.002%.
Bibliographic reference. Wang, Longbiao / Yoshida, Yohei / Kawakami, Yuta / Nakagawa, Seiichi (2015): "Relative phase information for detecting human speech and spoofed speech", In INTERSPEECH-2015, 2092-2096.