INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

HMM Adaptation and Voice Conversion for the Synthesis of Child Speech: A Comparison

Oliver Watts (1), Junichi Yamagishi (1), Simon King (1), Kay Berkling (2)

(1) University of Edinburgh, UK
(2) Inline Internet Online Dienste GmbH, Germany

This study compares two different methodologies for producing data-driven synthesis of child speech from existing systems that have been trained on the speech of adults. On one hand, an existing statistical parametric synthesiser is transformed using model adaptation techniques, informed by linguistic and prosodic knowledge, to the speaker characteristics of a child speaker. This is compared with the application of voice conversion techniques to convert the output of an existing waveform concatenation synthesiser with no explicit linguistic or prosodic knowledge. In a subjective evaluation of the similarity of synthetic speech to natural speech from the target speaker, the HMM-based systems evaluated are generally preferred, although this is at least in part due to the higher dimensional acoustic features supported by these techniques.

Full Paper

Bibliographic reference.  Watts, Oliver / Yamagishi, Junichi / King, Simon / Berkling, Kay (2009): "HMM adaptation and voice conversion for the synthesis of child speech: a comparison", In INTERSPEECH-2009, 2627-2630.