ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations

Tomi Kinnunen, Md. Jahangir Alam, Pavel Matějka, Patrick Kenny, Jan Černocký, Douglas O'Shaughnessy

Accuracy of speaker verification is high under controlled conditions but falls off rapidly in the presence of interfering sounds. This is because spectral features, such as Mel-frequency cepstral coefficients (MFCCs), are sensitive to additive noise. MFCCs are a particular realization of warped-frequency representation with lowfrequency focus. But there are several alternative, potentially more robust, warped-frequency representations. We provide an experimental comparison of five warped-frequency features. They use exactly the same frequency warping function, the same number of coefficients and postprocessing, but differ in their internal computations. The compared variants are (1) conventional MFCCs from discrete Fourier transform (DFT), followed by Mel-scaled filterbank, (2) MFCCs via direct warping of DFT, followed by linear-scale filterbank, (3) warped linear prediction features, (4) perceptual minimum variance distortionless features and (5) recently proposed sparse Mel-scale histogram features. Experiments carried out on a subset of the SRE 10 corpus using a scaled-down i-vector system indicate that direct DFT warping outperforms conventional MFCCs in most of the cases.


doi: 10.21437/Interspeech.2013-680

Cite as: Kinnunen, T., Alam, M.J., Matějka, P., Kenny, P., Černocký, J., O'Shaughnessy, D. (2013) Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations. Proc. Interspeech 2013, 3122-3126, doi: 10.21437/Interspeech.2013-680

@inproceedings{kinnunen13_interspeech,
  author={Tomi Kinnunen and Md. Jahangir Alam and Pavel Matějka and Patrick Kenny and Jan Černocký and Douglas O'Shaughnessy},
  title={{Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3122--3126},
  doi={10.21437/Interspeech.2013-680}
}