Prevalent speaker recognition methods use only spectral-envelope based features such as MFCC, ignoring the rich speaker identity information contained in the temporal-spectral dynamics of the entire speech signal. We propose a new feature called compressed spectral dynamics or CSD for speaker recognition based on a compressed time-frequency representations of spoken passwords which effectively captures the speaker identity. The fixed-dimension nature of the CSD allows classification to remain simple while keeping the discriminatory power of the 2D intermediate time-frequency representations. The proposed MSRI-CSD text-dependent speaker recognition method uses a simple nearest neighbor classifier and delivers performance competitive to conventional MFCC+DTW based speaker recognition methods at significantly lower complexity.
Bibliographic reference. Das, Amitava / Chittaranjan, Gokul (2008): "Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech", In INTERSPEECH-2008, 1921-1924.