10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Speaker Identification Using Warped MVDR Cepstral Features

Matthias Wölfel (1), Qian Yang (2), Qin Jin (3), Tanja Schultz (2)

(1) ZKM, Germany
(2) Universität Karlsruhe (TH), Germany
(3) Carnegie Mellon University, USA

It is common practice to use similar or even the same feature extraction methods for automatic speech recognition and speaker identification. While the front-end for the former requires to preserve phoneme discrimination and to compensate for speaker differences to some extend, the front-end for the latter has to preserve the unique characteristics of individual speakers. It seems, therefore, contradictory to use the same feature extraction methods for both tasks. Starting out from the common practice we propose to use warped minimum variance distortionless response (MVDR) cepstral coefficients, which have already been demonstrated to perform superior for automatic speech recognition in particular under adverse conditions. Replacing the widely used mel-frequency cepstral coefficients by WMVDR cepstral coefficients improves the speaker identification accuracy by up to 24% relative. We found that the optimal choice of the model order within the WMVDR framework differs between speech recognition and speaker recognition, confirming our intuition that the two different tasks indeed require different feature extraction strategies.

Full Paper

Bibliographic reference.  Wölfel, Matthias / Yang, Qian / Jin, Qin / Schultz, Tanja (2009): "Speaker identification using warped MVDR cepstral features", In INTERSPEECH-2009, 912-915.