ISCA Archive SPECOM 2004
ISCA Archive SPECOM 2004

Fusion of spectral feature sets for accurate speaker identification

Tomi Kinnunen, Ville Hautamäki, Pasi Fränti

Several features have been proposed for automatic speaker recognition. Despite their noise sensitivity, lowlevel spectral features are the most popular ones because of their easy computation. Although in principle different spectral representations carry similar information (spectral shape), in practice the different features differ in their performance. For instance, LPC-cepstrum picks more “details” of the short-term spectrum than the FFTcepstrum with the same number of coefficients. In this work, we consider using multiple spectral presentations simultaneously for improving the accuracy of speaker recognition. We use the following feature sets: melfrequency cepstral coefficients (MFCC), LPC-cepstrum (LPCC), arcus sine reflection coefficients (ARCSIN), formant frequencies (FMT), and the corresponding deltaparameters of all feature sets. We study the two ways of combining the feature sets: feature-level fusion (feature vector concatenation), score-level fusion (soft combination of classifier outputs), and decision-level fusion (combination of classifier decision).


Cite as: Kinnunen, T., Hautamäki, V., Fränti, P. (2004) Fusion of spectral feature sets for accurate speaker identification. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 361-365

@inproceedings{kinnunen04_specom,
  author={Tomi Kinnunen and Ville Hautamäki and Pasi Fränti},
  title={{Fusion of spectral feature sets for accurate speaker identification}},
  year=2004,
  booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)},
  pages={361--365}
}