10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Mel, Linear, and Antimel Frequency Cepstral Coefficients in Broad Phonetic Regions for Telephone Speaker Recognition

Howard Lei, Eduardo Lopez


We’ve examined the speaker discriminative power of mel-, antimeland linear-frequency cepstral coefficients (MFCCs, a-MFCCs and LFCCs) in the nasal, vowel, and non-nasal consonant speech regions. Our inspiration came from the work of Lu and Dang in 2007, who showed that filterbank energies at some frequencies mainly outside the telephone bandwidth possess more speaker discriminative power due to physiological characteristics of speakers, and derived a set of cepstral coefficients that outperformed MFCCs in non-telephone speech. Using telephone speech, we’ve discovered that LFCCs gave 21.5% and 15.0% relative EER improvements over MFCCs in nasal and non-nasal consonant regions, agreeing with our filterbank energy f-ratio analysis. We’ve also found that using only the vowel region with MFCCs gives a 9.1% relative improvement over using all speech. Last, we’ve shown that a-MFCCs are valuable in combination, contributing to a system with 17.3% relative improvement over our baseline.

Full Paper

Bibliographic reference.  Lei, Howard / Lopez, Eduardo (2009): "Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition", In INTERSPEECH-2009, 2323-2326.