ISCA Archive SSW 1998
ISCA Archive SSW 1998

Which is more important in a concatenative text to speech system - pitch, duration, or spectral discontinuity?

M. Plumpe, S. Meredith

This paper focuses on experimental evaluations designed to determine the relative quality of the components of the Whistler TTS engine. Eight different systems were compared pairwise to determine a rank ordering as well as a measure of the quality difference between the systems. The most interesting aspect of the results is that the simple unit duration scheme used in Whistler was found to be very good, both when it was used in combination with natural acoustics and pitch as well as when it was taken in combination with synthetic pitch. The synthetic pitch was found to be the aspect of the system that results in greatest quality degradation.


Cite as: Plumpe, M., Meredith, S. (1998) Which is more important in a concatenative text to speech system - pitch, duration, or spectral discontinuity? Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3), 231-236

@inproceedings{plumpe98_ssw,
  author={M. Plumpe and S. Meredith},
  title={{Which is more important in a concatenative text to speech system - pitch, duration, or spectral discontinuity?}},
  year=1998,
  booktitle={Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3)},
  pages={231--236}
}