Robust feature for speaker identification in noisy environments is proposed. This method is inspired by the human binaural auditory system. A pair of microphones is used to replicate human ears in the processing. Cross-correlation processing is taken of the microphone outputs after Gammatone bandpass filtering, rectification and compression. ICA is then applied to the real cepstrum of the correlated waveform to extract the dominant components from each frequency band. The resulting feature emphases the difference in the statistical structures among speakers. Compared to the commonly used MFCC techniques, the proposed method is more robust to background noises and provides higher identification rate in real noisy environments for text-independent speaker identification systems. A specially prepared noisy speech corpus was used to gauge the performance of the proposed feature.
Bibliographic reference. Zhang, Yushi / Abdulla, Waleed H. (2008): "Robust speaker identification using cross-correlation GTF-ICA feature", In INTERSPEECH-2008, 1913-1916.