The most successful systems in previous comparative studies on speaker age recognition used short-term cepstral features modeled with Gaussian Mixture Models (GMMs) or applied multiple phone recognizers trained with the data of speakers of the respective class. Acoustic analyses, however, indicate that certain features such as pitch extracted from a longer span of speech correlate clearly with the speaker age although the systems based on those features have been inferior to the before mentioned approaches. In this paper, three novel systems combining short-term cepstral features and long-term features for speaker age recognition are compared to each other. A system combining GMMs using frame-based MFCCs and Support-Vector-Machines using long-term pitch performs best. The results indicate that the combination of the two feature types is a promising approach, which corresponds to findings in related fields like speaker recognition.
Bibliographic reference. Müller, Christian / Burkhardt, Felix (2007): "Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age", In INTERSPEECH-2007, 2277-2280.