Interspeech'2005 - Eurospeech
In the effort to obtain synthetic speech with the quality near to natural, and potentially, to be able to build expressive synthesis, the unit selection approach has become very important. To increase the naturalness of our native TTS system ARTIC we employed the specific version of the approach. It is driven by the high-level symbolic prosody description, defined according to the phenomena of prosodic synonymy and homonymy. The concrete prosody of a synthesized phrase is not explicitly set here, but emerges on the basis of the target and concatenation costs. Although this is our first treatment requiring some simplification, and for the synonymy/homonymy phenomena only the basics are defined, the first results have already shown that there is a significant shift towards high quality. Listening tests comparing speech from single-instance version to selection-based version of ARTIC showed clear preference of the selection-based version. In addition, the level of naturalness was on average assessed as "close to natural".
Bibliographic reference. Tihelka, Daniel (2005): "Symbolic prosody driven unit selection for highly natural synthetic speech", In INTERSPEECH-2005, 2525-2528.