11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Combining Five Acoustic Level Modeling Methods for Automatic Speaker Age and Gender Recognition

Ming Li (1), Chi-Sang Jung (2), Kyu J. Han (1)

(1) University of Southern California, USA
(2) Yonsei University, Korea

This paper presents a novel automatic speaker age and gender identi?cation approach which combines ?ve different methods at the acoustic level to improve the baseline performance. The ?ve subsystems are (1) Gaussian mixture model (GMM) system based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors, (3) SVM based on GMM maximum likelihood linear regression (MLLR) matrix supervectors, (4) SVM based on GMM ‘Tandem’ supervectors, and (5) SVM baseline sys- tem based on the 450-dimensional feature vectors including prosodic features at the utterance level provided by the challenge organizing committee. To improve the overall classification performance, fusion of these ?ve subsystems at the score level is performed. The proposed fusion system achieves 52.7% unweighted accuracy for the joint age-gender classi?cation task and outperforms the GMM-MFCC system and SVM baseline, respectively, by 9.6% and 8.2% absolute improvement on the 2010 Interspeech Paralinguistic Challenge aGender database.

Full Paper

Bibliographic reference.  Li, Ming / Jung, Chi-Sang / Han, Kyu J. (2010): "Combining five acoustic level modeling methods for automatic speaker age and gender recognition", In INTERSPEECH-2010, 2826-2829.