Prosody is an inherent supra-segmental feature in speech that human speakers employ to express, for example, attitude, emotion, intent and attention. In text-to-speech (TTS) systems, high naturalness can only be achieved if the prosody of the output is appropriate. The importance of prosody is even more crucial for tonal languages, such as Mandarin Chinese, in which the tone of each syllable is described by its pitch contour. In this paper, we propose a novel prosody modeling approach that uses the concept of syllable-based eigenpitch. The approach has been implemented in our Mandarin TTS system resulting in less than 0.1% error variance. The results obtained in practical experiments have confirmed the good performance of the proposed technique.
Bibliographic reference. Tian, Jilei / Nurminen, Jani / Kiss, Imre (2007): "Novel eigenpitch-based prosody model for text-to-speech synthesis", In INTERSPEECH-2007, 1278-1281.