It is common practice to use similar or even the same feature extraction methods for automatic speech recognition and speaker identification. While the front-end for the former requires to preserve phoneme discrimination and to compensate for speaker differences to some extend, the front-end for the latter has to preserve the unique characteristics of individual speakers. It seems, therefore, contradictory to use the same feature extraction methods for both tasks. Starting out from the common practice we propose to use warped minimum variance distortionless response (MVDR) cepstral coefficients, which have already been demonstrated to perform superior for automatic speech recognition in particular under adverse conditions. Replacing the widely used mel-frequency cepstral coefficients by WMVDR cepstral coefficients improves the speaker identification accuracy by up to 24% relative. We found that the optimal choice of the model order within the WMVDR framework differs between speech recognition and speaker recognition, confirming our intuition that the two different tasks indeed require different feature extraction strategies.
Bibliographic reference. Wölfel, Matthias / Yang, Qian / Jin, Qin / Schultz, Tanja (2009): "Speaker identification using warped MVDR cepstral features", In INTERSPEECH-2009, 912-915.