ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Novel eigenpitch-based prosody model for text-to-speech synthesis

Jilei Tian, Jani Nurminen, Imre Kiss

Prosody is an inherent supra-segmental feature in speech that human speakers employ to express, for example, attitude, emotion, intent and attention. In text-to-speech (TTS) systems, high naturalness can only be achieved if the prosody of the output is appropriate. The importance of prosody is even more crucial for tonal languages, such as Mandarin Chinese, in which the tone of each syllable is described by its pitch contour. In this paper, we propose a novel prosody modeling approach that uses the concept of syllable-based eigenpitch. The approach has been implemented in our Mandarin TTS system resulting in less than 0.1% error variance. The results obtained in practical experiments have confirmed the good performance of the proposed technique.

doi: 10.21437/Interspeech.2007-229

Cite as: Tian, J., Nurminen, J., Kiss, I. (2007) Novel eigenpitch-based prosody model for text-to-speech synthesis. Proc. Interspeech 2007, 1278-1281, doi: 10.21437/Interspeech.2007-229

  author={Jilei Tian and Jani Nurminen and Imre Kiss},
  title={{Novel eigenpitch-based prosody model for text-to-speech synthesis}},
  booktitle={Proc. Interspeech 2007},