Odyssey 2010: The Speaker and Language Recognition Workshop
Brno, Czech Republic
Most conventional features used in speaker recognition are based on spectral envelope characterizations such as Mel-scale filterbank cepstrum coefficients (MFCC), Linear Prediction Cepstrum Coefficient (LPCC) and Perceptual Linear Prediction (PLP). The MFCC's success has seen it become a de facto standard feature for speaker recognition. Alternative features, that convey information other than the average subband energy, have been proposed, such as frequency modulation (FM) and subband spectral centroid features. In this study, we investigate the characterization of subband energy as a two dimensional feature, comprising Spectral Centroid Magnitude (SCM) and Spectral Centroid Frequency (SCF). Empirical experiments carried out on the NIST 2001 and NIST 2006 databases using SCF, SCM and their fusion suggests that the combination of SCM and SCF are somewhat more accurate compared with conventional MFCC, and that both fuse effectively with MFCCs. We also show that frame-averaged FM features are essentially centroid features, and provide an SCF implementation that improves on the speaker recognition performance of both subband spectral centroid and FM features.
Full Paper (PDF)
Bibliographic reference. Kua, Jia Min Karen / Thiruvaran, Tharmarajah / Nosratighods, Mohaddeseh / Ambikairajah, Eliathamby / Epps, Julien (2010): "Investigation of Spectral Centroid Magnitude and Frequency for Speaker Recognition", In Odyssey-2010, paper 007.