EUROSPEECH 2003 - INTERSPEECH 2003
The quality of the synthetic speech provided by concatenative speech systems depends heavily on the capability of accurately modeling the different characteristics of speech segments. Moreover, the relative significance or weighting of each feature in the unit selection process is a key point in the relationship between synthetic speech and human perception. In this paper we propose a new method for optimizing these weights, making a separate training according to the nature of the different parts of the cost function, i.e., the features referred to the phonetic context of the units and the features related to their prosodic characteristics. This work is mainly focused on the target cost function.
Bibliographic reference. Diaz, Francisco Campillo / Banga, Eduardo R. (2003): "On the design of cost functions for unit-selection speech synthesis", In EUROSPEECH-2003, 289-292.