11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering

Tomi Kinnunen (1), Rahim Saeidi (1), Johan Sandberg (2), Maria Hansson-Sandsten (2)

(1) University of Eastern Finland, Finland
(2) Lund University, Sweden

Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.

Full Paper

Bibliographic reference.  Kinnunen, Tomi / Saeidi, Rahim / Sandberg, Johan / Hansson-Sandsten, Maria (2010): "What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering", In INTERSPEECH-2010, 2734-2737.