Odyssey 2008: The Speaker and Language Recognition Workshop
Stellenbosch, South Africa
This paper proposes using modulation cepstrum coefficients instead of cepstral coefficients for extracting metadata information such as age and gender. These coefficients are extracted by applying discrete cosine transform to a time-sequence of cepstral coefficients. Lower order coefficients of this transformation represent smooth cepstral trajectories over time. Results presented in this paper show that cepstral trajectories corresponding to lower (3-14 Hz) modulation frequencies provide best discrimination. The proposed system achieves 50.2% overall accuracy for this 7-class task while accuracy of human labelers on a subset of evaluation material used in this work is 54.7%.
Full Paper Presentation (PPT)
Bibliographic reference. Ajmera, Jitendra / Burkhardt, Felix (2008): "Age and gender classification using modulation cepstrum", In Odyssey-2008, paper 025.