This paper proposes using modulation cepstrum coefficients instead of cepstral coefficients for extracting metadata information such as age and gender. These coefficients are extracted by applying discrete cosine transform to a time-sequence of cepstral coefficients. Lower order coefficients of this transformation represent smooth cepstral trajectories over time. Results presented in this paper show that cepstral trajectories corresponding to lower (3-14 Hz) modulation frequencies provide best discrimination. The proposed system achieves 50.2% overall accuracy for this 7-class task while accuracy of human labelers on a subset of evaluation material used in this work is 54.7%.
Cite as: Ajmera, J., Burkhardt, F. (2008) Age and gender classification using modulation cepstrum. Proc. The Speaker and Language Recognition Workshop (Odyssey 2008), paper 25
@inproceedings{ajmera08_odyssey, author={Jitendra Ajmera and Felix Burkhardt}, title={{Age and gender classification using modulation cepstrum}}, year=2008, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2008)}, pages={paper 25} }