Odyssey 2012 - The Speaker and Language Recognition Workshop

June 25-28, 2012

Regularization of All-Pole Models for Speaker Verification Under Additive Noise

Cemal Hanilçi (1,2), Tomi Kinnunen (2), Rahim Saeidi (3), Jouni Pohjalainen (4), Paavo Alku (4), Figen Ertaş (1)

(1) Department of Electronic Engineering, Uludağ University, Bursa, Turkey
(2) School of Computing, University of Eastern Finland, Finland
(3) Centre for Language and Speech Technology, Radboud University Nijmegen, Netherlands
(4) Department of Signal Processing and Acoustics, Aalto University, Finland

Regularization of linear prediction based mel-frequency cepstral coefficient (MFCC) extraction in speaker verification is considered. Commonly, MFCCs are extracted from the discrete Fourier transform (DFT) spectra of speech frames. In our recent study, it was shown that replacing the DFT spectrum estimation step with the conventional and temporally weighted linear prediction (LP) and their regularized versions increases the recognition performance considerably. In this paper, we provide a thorough analysis on the regularization of conventional and temporally weighted LP methods. Experiments on the NIST 2002 corpus indicate that regularized all-pole methods yield large improvements on recognition accuracy under additive factory and babble noise conditions (e.g. 10% relative improvement over standard DFT method for 0 dB SNR factory noise) in terms of both equal error rate (EER) and minimum detection cost function (MinDCF).

Full Paper

Bibliographic reference.  Hanilçi, Cemal / Kinnunen, Tomi / Saeidi, Rahim / Pohjalainen, Jouni / Alku, Paavo / Ertaş, Figen (2012): "Regularization of all-pole models for speaker verification under additive noise", In Odyssey-2012, 236-242.