9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Text-Dependent Speaker Recognition by Efficient Capture of Speaker Dynamics in Compressed Time-Frequency Representations of Speech

Amitava Das, Gokul Chittaranjan

Microsoft Research India, India

Prevalent speaker recognition methods use only spectral-envelope based features such as MFCC, ignoring the rich speaker identity information contained in the temporal-spectral dynamics of the entire speech signal. We propose a new feature called compressed spectral dynamics or CSD for speaker recognition based on a compressed time-frequency representations of spoken passwords which effectively captures the speaker identity. The fixed-dimension nature of the CSD allows classification to remain simple while keeping the discriminatory power of the 2D intermediate time-frequency representations. The proposed MSRI-CSD text-dependent speaker recognition method uses a simple nearest neighbor classifier and delivers performance competitive to conventional MFCC+DTW based speaker recognition methods at significantly lower complexity.

Full Paper

Bibliographic reference.  Das, Amitava / Chittaranjan, Gokul (2008): "Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech", In INTERSPEECH-2008, 1921-1924.