12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model

Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu

University of Tokyo, Japan

In this paper, tone nucleus model is employed to represent and convert F0 contour for synthesizing an emotional Mandarin speech from a neutral speech. Compared with previous prosody transforming methods, the proposed method 1) only converts the tone nucleus part of each syllable rather than the whole F0 contour to avoid the data sparseness problems; 2) builds mapping functions for well-chosen tone nucleus model parameters to better capture Mandarin tonal information. Using only a modest amount of training data, the perceptual accuracy achieved by our method was shown to be comparable to that obtained by a professional speaker.

Full Paper

Bibliographic reference.  Wen, Miaomiao / Wang, Miaomiao / Hirose, Keikichi / Minematsu, Nobuaki (2011): "Prosody conversion for emotional Mandarin speech synthesis using the tone nucleus model", In INTERSPEECH-2011, 2797-2800.