In this paper, we investigate the acoustic features that can be modified to control the perceptual age of a singing voice. Singers can sing expressively by controlling prosody and vocal timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice characteristics of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we focus on controlling the perceived age of the singer and, as a first step, perform an investigation of the factors that play a part in the listener's perception of the singer's age. The experimental results demonstrate that 1) the perceptual age of singing voices corresponds relatively well to the actual age of the singer, 2) speech analysis/synthesis processing and statistical voice conversion processing donft cause adverse effects on the perceptual age of singing voices, and 3) prosodic features have a larger effect on the perceptual age than spectral features.
Bibliographic reference. Kobayashi, Kazuhiro / Doi, Hironori / Toda, Tomoki / Nakano, Tomoyasu / Goto, Masataka / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2013): "An investigation of acoustic features for singing voice conversion based on perceptual age", In INTERSPEECH-2013, 1057-1061.