ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Natural-sounding speech synthesis using variable-length units

Jon R. W. Yi, James R. Glass

The goal of this work was to develop a speech synthesis system which concatenates variable-length units to create natural-sounding speech. Our initial work showed that by careful design of system responses to ensure consistent intonation contours, natural-sounding speech synthesis was achievable with word- and phrase-level concatenation. In order to extend the flexibility of this framework, we focused on generating novel words from a corpus of sub-word units. The design of the corpus was motivated by perceptual experiments that investigated where speech could be spliced with minimal audible distortion and what contextual constraints were necessary to maintain in order to produce natural-sounding speech. From this sub-word corpus, a Viterbi search selects a sequence of units based on how well they match the input specification and concatenation constraints. This concatenative speech synthesis system, ENVOICE, has been used in a conversational system in two application domains to convert meaning representations into speech waveforms.

doi: 10.21437/ICSLP.1998-575

Cite as: Yi, J.R.W., Glass, J.R. (1998) Natural-sounding speech synthesis using variable-length units. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 1151, doi: 10.21437/ICSLP.1998-575

  author={Jon R. W. Yi and James R. Glass},
  title={{Natural-sounding speech synthesis using variable-length units}},
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 1151},