INTERSPEECH 2004 - ICSLP
8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Time -Frequency Analysis of Vocal Source Signal for Speaker Recognition

Nengheng Zheng, P. C. Ching, Tan Lee

The Chinese University of Hong Kong, Hong Kong

This paper investigates the importance of spectro-temporal characteristics of the source excitation signal for speaker recognition. We propose an effective feature extraction technique for obtaining essential time-frequency information from the linear prediction (LP) residual signal, which are closely related to the glottal excitation of individual speaker. With pitch synchronous analysis, wavelet transform is applied to every two pitch cycles of the LP residual signal to generate a new feature vector, called Wavelet Octave Coefficients of Residues (WOCOR), which provides additional speaker discriminative power to the commonly used linear predictive Cepstral coefficients (LPCC). Experimental evaluation over a Cantonese speaker recognition corpus demonstrates the effectiveness of WOCOR for speaker recognition. Recognition tests with WOCOR and LPCC outperforms the conventional methods of using Mel Frequency Cepstral Coefficients (MFCC).

Full Paper

Bibliographic reference.  Zheng, Nengheng / Ching, P. C. / Lee, Tan (2004): "Time -frequency analysis of vocal source signal for speaker recognition", In INTERSPEECH-2004, 2333-2336.