Odyssey 2012 - The Speaker and Language Recognition Workshop
Regularization of linear prediction based mel-frequency cepstral coefficient (MFCC) extraction in speaker verification is considered. Commonly, MFCCs are extracted from the discrete Fourier transform (DFT) spectra of speech frames. In our recent study, it was shown that replacing the DFT spectrum estimation step with the conventional and temporally weighted linear prediction (LP) and their regularized versions increases the recognition performance considerably. In this paper, we provide a thorough analysis on the regularization of conventional and temporally weighted LP methods. Experiments on the NIST 2002 corpus indicate that regularized all-pole methods yield large improvements on recognition accuracy under additive factory and babble noise conditions (e.g. 10% relative improvement over standard DFT method for 0 dB SNR factory noise) in terms of both equal error rate (EER) and minimum detection cost function (MinDCF).
Bibliographic reference. Hanilçi, Cemal / Kinnunen, Tomi / Saeidi, Rahim / Pohjalainen, Jouni / Alku, Paavo / Ertaş, Figen (2012): "Regularization of all-pole models for speaker verification under additive noise", In Odyssey-2012, 236-242.