ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Speaker identification using warped MVDR cepstral features

Matthias Wölfel, Qian Yang, Qin Jin, Tanja Schultz

It is common practice to use similar or even the same feature extraction methods for automatic speech recognition and speaker identification. While the front-end for the former requires to preserve phoneme discrimination and to compensate for speaker differences to some extend, the front-end for the latter has to preserve the unique characteristics of individual speakers. It seems, therefore, contradictory to use the same feature extraction methods for both tasks. Starting out from the common practice we propose to use warped minimum variance distortionless response (MVDR) cepstral coefficients, which have already been demonstrated to perform superior for automatic speech recognition in particular under adverse conditions. Replacing the widely used mel-frequency cepstral coefficients by WMVDR cepstral coefficients improves the speaker identification accuracy by up to 24% relative. We found that the optimal choice of the model order within the WMVDR framework differs between speech recognition and speaker recognition, confirming our intuition that the two different tasks indeed require different feature extraction strategies.

doi: 10.21437/Interspeech.2009-274

Cite as: Wölfel, M., Yang, Q., Jin, Q., Schultz, T. (2009) Speaker identification using warped MVDR cepstral features. Proc. Interspeech 2009, 912-915, doi: 10.21437/Interspeech.2009-274

  author={Matthias Wölfel and Qian Yang and Qin Jin and Tanja Schultz},
  title={{Speaker identification using warped MVDR cepstral features}},
  booktitle={Proc. Interspeech 2009},