11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Assessment of Single-Channel Speech Enhancement Techniques for Speaker Identification Under Mismatched Conditions

Seyed Omid Sadjadi, John H. L. Hansen

University of Texas at Dallas, USA

It is well known that MFCC based speaker identification (SID) systems easily break down under mismatched training and test conditions. In this paper, we report on a study that considers four different single-channel speech enhancement front-ends for robust SID under such conditions. Speech files from the YOHO database are corrupted with four types of noise including babble, car, factory, and white Gaussian at five SNR levels (020 dB), and processed using four speech enhancement techniques representing distinct classes of algorithms: spectral subtraction, statistical model-based, subspace, and Wiener filtering. Both processed and unprocessed files are submitted to a SID system trained on clean data. In addition, a new set of acoustic feature parameters based on Hilbert envelope of gammatone filterbank outputs are proposed and evaluated for SID task. Experimental results indicate that: (i) depending on the noise type and SNR level, the enhancement front-ends may help or hurt SID performance, (ii) the proposed feature significantly achieves higher SID accuracy compared to MFCCs under mismatched conditions.

Full Paper

Bibliographic reference.  Sadjadi, Seyed Omid / Hansen, John H. L. (2010): "Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions", In INTERSPEECH-2010, 2138-2141.