10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Analyzing Features for Automatic Age Estimation on Cross-Sectional Data

Werner Spiegl (1), Georg Stemmer (2), Eva Lasarcyk (3), Varada Kolhatkar (4), Andrew Cassidy (5), Blaise Potard (6), Stephen Shum (7), Young Chol Song (8), Puyang Xu (5), Peter Beyerlein (9), James Harnsberger (10), Elmar Nöth (1)

(1) FAU Erlangen-Nürnberg, Germany
(2) SVOX Deutschland GmbH, Germany
(3) Universität des Saarlandes, Germany
(4) University of Minnesota Duluth, USA
(5) Johns Hopkins University, USA
(6) CRIN, France
(7) University of California at Berkeley, USA
(8) Stony Brook University, USA
(9) TFH Wildau, Germany
(10) University of Florida, USA

We develop an acoustic feature set for the estimation of a person’s age from a recorded speech signal. The baseline features are Mel-frequency cepstral coefficients (MFCCs) which are extended by various prosodic features, pitch and formant frequencies. From experiments on the University of Florida Vocal Aging Database we can draw different conclusions. On the one hand, adding prosodic, pitch and formant features to the MFCC baseline leads to relative reductions of the mean absolute error between 4–20%. Improvements are even larger when perceptual age labels are taken as a reference. On the other hand, reasonable results with a mean absolute error in age estimation of about 12 years are already achieved using a simple gender-independent setup and MFCCs only. Future experiments will evaluate the robustness of the prosodic features against channel variability on other databases and investigate the differences between perceptual and chronological age labels.

Full Paper

Bibliographic reference.  Spiegl, Werner / Stemmer, Georg / Lasarcyk, Eva / Kolhatkar, Varada / Cassidy, Andrew / Potard, Blaise / Shum, Stephen / Song, Young Chol / Xu, Puyang / Beyerlein, Peter / Harnsberger, James / Nöth, Elmar (2009): "Analyzing features for automatic age estimation on cross-sectional data", In INTERSPEECH-2009, 2923-2926.