EUROSPEECH 2003 - INTERSPEECH 2003
This paper proposes a technique to estimate speakers' perceptual age automatically only with acoustic information of their utterances. Firstly, we experimentally collected data of how old individual speakers in databases sound to listeners. Speech samples of approximately 500 male speakers with a very wide range of the real age were presented to listeners, who were asked to estimate the age only by hearing. Using the results, the perceptual age of the individual speakers was defined in two ways as label (averaged age over the listeners) and distribution. Then, each of the speakers was acoustically modeled by GMMs. Finally, the perceptual age of an input speaker was estimated as weighted sum of the perceptual age of all the other speakers in the databases, where the weight for speaker i was calculated as a function of likelihood score of the input speaker as speaker i. Experiments showed that correlation was about 0.9 between the perceptual age estimated by the listening test and that estimated by the proposed method. This paper also introduces some techniques to realize robust estimation of the perceptual age.
Bibliographic reference. Minematsu, Nobuaki / Yamauchi, Keita / Hirose, Keikichi (2003): "Automatic estimation of perceptual age using speaker modeling techniques", In EUROSPEECH-2003, 3005-3008.