ISCA Archive Odyssey 2004
ISCA Archive Odyssey 2004

Comparison of MPEG-7 basis projection features and MFCC applied to robust speaker recognition

Hyoung-Gook Kim, Martin Haller, Thomas Sikora

Our purpose is to evaluate the efficiency of MPEG-7 basis projection (BP) features vs. Mel-scale Frequency Cepstrum Coef- ficients (MFCC) for speaker recognition in noisy environments. The MPEG-7 feature extraction mainly consists of a Normalized Audio Spectrum Envelope (NASE), a basis decomposition algorithm and a spectrum basis projection. Prior to the feature extraction the noise reduction algorithm is performed by using a modified log spectral amplitude speech estimator (LSA) and a minima controlled noise estimation (MC). The noise-reduced features can be effectively used in a HMM-based recognition system. The performance is measured by the segmental signalto- noise ratio, and the recognition results of the MPEG-7 standardized features vs. Mel-scale Frequency Cepstrum Coeffi- cients (MFCC) in comparison to other noise reduction methods. Results show that the MFCC features yield better performance compared to MPEG-7 features.


Cite as: Kim, H.-G., Haller, M., Sikora, T. (2004) Comparison of MPEG-7 basis projection features and MFCC applied to robust speaker recognition. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 275-278

@inproceedings{kim04_odyssey,
  author={Hyoung-Gook Kim and Martin Haller and Thomas Sikora},
  title={{Comparison of MPEG-7 basis projection features and MFCC applied to robust speaker recognition}},
  year=2004,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2004)},
  pages={275--278}
}