ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Symbolic prosody driven unit selection for highly natural synthetic speech

Daniel Tihelka

In the effort to obtain synthetic speech with the quality near to natural, and potentially, to be able to build expressive synthesis, the unit selection approach has become very important. To increase the naturalness of our native TTS system ARTIC we employed the specific version of the approach. It is driven by the high-level symbolic prosody description, defined according to the phenomena of prosodic synonymy and homonymy. The concrete prosody of a synthesized phrase is not explicitly set here, but emerges on the basis of the target and concatenation costs. Although this is our first treatment requiring some simplification, and for the synonymy/homonymy phenomena only the basics are defined, the first results have already shown that there is a significant shift towards high quality. Listening tests comparing speech from single-instance version to selection-based version of ARTIC showed clear preference of the selection-based version. In addition, the level of naturalness was on average assessed as "close to natural".


doi: 10.21437/Interspeech.2005-786

Cite as: Tihelka, D. (2005) Symbolic prosody driven unit selection for highly natural synthetic speech. Proc. Interspeech 2005, 2525-2528, doi: 10.21437/Interspeech.2005-786

@inproceedings{tihelka05_interspeech,
  author={Daniel Tihelka},
  title={{Symbolic prosody driven unit selection for highly natural synthetic speech}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2525--2528},
  doi={10.21437/Interspeech.2005-786}
}