ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Comparison of spectrum estimators in speaker verification: mismatch conditions induced by vocal effort

Cemal Hanilçi, Tomi Kinnunen, Padmanabhan Rajan, Jouni Pohjalainen, Paavo Alku, Figen Ertaş

We study the problem of vocal effort mismatch in speaker verification. Changes in speaker's vocal effort induce changes in fundamental frequency (F0) and formant structure which introduce unwanted intra-speaker variations to features. We compare seven alternative spectrum estimators in the context of mel-frequency cepstral coefficient (MFCC) extraction for speaker verification. The compared variants include traditional FFT spectrum and six parametric all-pole models. Experimental results on the NIST 2010 speaker recognition evaluation (SRE) corpus utilizing both GMM-UBM and more recent GMM supervector classifier indicate that spectrum estimation has a considerable impact on speaker verification accuracy under mismatched vocal effort conditions. The highest recognition accuracy was achieved using a particular variant of temporally weighted all-pole model, stabilized weighted linear prediction (SWLP).


doi: 10.21437/Interspeech.2013-255

Cite as: Hanilçi, C., Kinnunen, T., Rajan, P., Pohjalainen, J., Alku, P., Ertaş, F. (2013) Comparison of spectrum estimators in speaker verification: mismatch conditions induced by vocal effort. Proc. Interspeech 2013, 2881-2885, doi: 10.21437/Interspeech.2013-255

@inproceedings{hanilci13_interspeech,
  author={Cemal Hanilçi and Tomi Kinnunen and Padmanabhan Rajan and Jouni Pohjalainen and Paavo Alku and Figen Ertaş},
  title={{Comparison of spectrum estimators in speaker verification: mismatch conditions induced by vocal effort}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2881--2885},
  doi={10.21437/Interspeech.2013-255}
}