8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Novel Eigenpitch-Based Prosody Model for Text-to-Speech Synthesis

Jilei Tian (1), Jani Nurminen (2), Imre Kiss (1)

(1) Nokia Research Center, Finland
(2) Nokia Technology Platforms, Finland

Prosody is an inherent supra-segmental feature in speech that human speakers employ to express, for example, attitude, emotion, intent and attention. In text-to-speech (TTS) systems, high naturalness can only be achieved if the prosody of the output is appropriate. The importance of prosody is even more crucial for tonal languages, such as Mandarin Chinese, in which the tone of each syllable is described by its pitch contour. In this paper, we propose a novel prosody modeling approach that uses the concept of syllable-based eigenpitch. The approach has been implemented in our Mandarin TTS system resulting in less than 0.1% error variance. The results obtained in practical experiments have confirmed the good performance of the proposed technique.

Full Paper

Bibliographic reference.  Tian, Jilei / Nurminen, Jani / Kiss, Imre (2007): "Novel eigenpitch-based prosody model for text-to-speech synthesis", In INTERSPEECH-2007, 1278-1281.