We study the problem of vocal effort mismatch in speaker verification. Changes in speaker's vocal effort induce changes in fundamental frequency (F0) and formant structure which introduce unwanted intra-speaker variations to features. We compare seven alternative spectrum estimators in the context of mel-frequency cepstral coefficient (MFCC) extraction for speaker verification. The compared variants include traditional FFT spectrum and six parametric all-pole models. Experimental results on the NIST 2010 speaker recognition evaluation (SRE) corpus utilizing both GMM-UBM and more recent GMM supervector classifier indicate that spectrum estimation has a considerable impact on speaker verification accuracy under mismatched vocal effort conditions. The highest recognition accuracy was achieved using a particular variant of temporally weighted all-pole model, stabilized weighted linear prediction (SWLP).
Bibliographic reference. Hanilçi, Cemal / Kinnunen, Tomi / Rajan, Padmanabhan / Pohjalainen, Jouni / Alku, Paavo / Ertaş, Figen (2013): "Comparison of spectrum estimators in speaker verification: mismatch conditions induced by vocal effort", In INTERSPEECH-2013, 2881-2885.