EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Automatic Estimation of Perceptual Age Using Speaker Modeling Techniques

Nobuaki Minematsu, Keita Yamauchi, Keikichi Hirose

University of Tokyo, Japan

This paper proposes a technique to estimate speakers' perceptual age automatically only with acoustic information of their utterances. Firstly, we experimentally collected data of how old individual speakers in databases sound to listeners. Speech samples of approximately 500 male speakers with a very wide range of the real age were presented to listeners, who were asked to estimate the age only by hearing. Using the results, the perceptual age of the individual speakers was defined in two ways as label (averaged age over the listeners) and distribution. Then, each of the speakers was acoustically modeled by GMMs. Finally, the perceptual age of an input speaker was estimated as weighted sum of the perceptual age of all the other speakers in the databases, where the weight for speaker i was calculated as a function of likelihood score of the input speaker as speaker i. Experiments showed that correlation was about 0.9 between the perceptual age estimated by the listening test and that estimated by the proposed method. This paper also introduces some techniques to realize robust estimation of the perceptual age.

Full Paper

Bibliographic reference.  Minematsu, Nobuaki / Yamauchi, Keita / Hirose, Keikichi (2003): "Automatic estimation of perceptual age using speaker modeling techniques", In EUROSPEECH-2003, 3005-3008.