ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Toshiba English text-to-speech synthesizer (TESS)

Chang K. Suh, Takehiko Kagoshima, Masahiro Morita, Shigenobu Seto, Masami Akamine

Toshiba English Text-to-Speech Synthesizer utilizes several new techniques to produce synthesized speech that is more natural-sounding and intelligible than that created by conventional synthesizers. The closed-loop training method creates synthesis units that most closely resemble the training data and are the least susceptible to prosodic distortion noise by analytically solving an equation that minimizes distortion between target units and training data. The pitch contour model creates a codebook of representative word-based F0 contours by first clustering the training data using word stress and syllable numbers. Within each cluster, the training data is divided into different groups using lexical and phonological attributes of each word. In each group, a representative contour is created using approximate error estimation. The resulting approximate errors are used in offset level prediction for each contour. These techniques have significantly improved the prosodic quality, naturalness and intelligibility of the resulting synthesized speech.

doi: 10.21437/Eurospeech.1999-469

Cite as: Suh, C.K., Kagoshima, T., Morita, M., Seto, S., Akamine, M. (1999) Toshiba English text-to-speech synthesizer (TESS). Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2111-2114, doi: 10.21437/Eurospeech.1999-469

  author={Chang K. Suh and Takehiko Kagoshima and Masahiro Morita and Shigenobu Seto and Masami Akamine},
  title={{Toshiba English text-to-speech synthesizer (TESS)}},
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},