ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech

Amitava Das, Gokul Chittaranjan

Prevalent speaker recognition methods use only spectral-envelope based features such as MFCC, ignoring the rich speaker identity information contained in the temporal-spectral dynamics of the entire speech signal. We propose a new feature called compressed spectral dynamics or CSD for speaker recognition based on a compressed time-frequency representations of spoken passwords which effectively captures the speaker identity. The fixed-dimension nature of the CSD allows classification to remain simple while keeping the discriminatory power of the 2D intermediate time-frequency representations. The proposed MSRI-CSD text-dependent speaker recognition method uses a simple nearest neighbor classifier and delivers performance competitive to conventional MFCC+DTW based speaker recognition methods at significantly lower complexity.


doi: 10.21437/Interspeech.2008-508

Cite as: Das, A., Chittaranjan, G. (2008) Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech. Proc. Interspeech 2008, 1921-1924, doi: 10.21437/Interspeech.2008-508

@inproceedings{das08b_interspeech,
  author={Amitava Das and Gokul Chittaranjan},
  title={{Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1921--1924},
  doi={10.21437/Interspeech.2008-508}
}