ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Common patterns in word level prosody

Frode Holm, Kazue Hata

The task of generating natural human-sounding prosody for text-to-speech (TTS) has historically been one of the most challenging problems that researchers and developers have had to face. TTS systems have in general become infamous for their "robotic" intonations. This paper describes an approach to this problem which endeavors to capture as much detail as possible from speech data, but in a way that avoids the "black boxes" typical of neural networks and some vector clustering algorithms. Unlike these latter methods, our approach may give feedback as to exactly what the crucial parameters are that determine the successful choice of pattern. Focusing on the notion of prosody templates, we confirmed that a representative F0 and duration pattern can be extracted based on stress pattern for a target proper noun which occurs in sentence-initial position.

doi: 10.21437/ICSLP.1998-113

Cite as: Holm, F., Hata, K. (1998) Common patterns in word level prosody. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 1038, doi: 10.21437/ICSLP.1998-113

  author={Frode Holm and Kazue Hata},
  title={{Common patterns in word level prosody}},
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 1038},