ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Increasing prosodic variability of text-to-speech synthesizers

Géza Németh, Márk Fék, Tamás Gábor Csapó

The lack of prosody variation in text-to-speech systems contributes to their perceived unnaturalness when synthesizing extended passages. In this paper, we present a method to improve prosody generation in this direction. A database of natural sample sentences is searched for sentences having similar word and syllable structure to the input. One sentence is selected randomly from the similar sentences found. The prosody of the randomly selected natural sentence is used as a target to generate the prosody of the synthetic one. An experiment was conducted to determine the potential of the proposed method. The rule-based pitch contour generation of a Hungarian concatenative synthesizer was replaced by a semi-automatic implementation of the proposed method. A listening test showed that subjects preferred sentences synthesized by the proposed method over a rule-based solution.

doi: 10.21437/Interspeech.2007-222

Cite as: Németh, G., Fék, M., Csapó, T.G. (2007) Increasing prosodic variability of text-to-speech synthesizers. Proc. Interspeech 2007, 474-477, doi: 10.21437/Interspeech.2007-222

  author={Géza Németh and Márk Fék and Tamás Gábor Csapó},
  title={{Increasing prosodic variability of text-to-speech synthesizers}},
  booktitle={Proc. Interspeech 2007},