8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Optimizing Integrated Cost Function for Segment Selection in Concatenative Speech Synthesis Based on Perceptual Evaluations

Tomoki Toda, Hisashi Kawai, Minoru Tsuzaki

ATR-SLT, Japan

This paper describes optimizing a cost function for segment selection in concatenative Text-to-Speech based on perceptual characteristics. We use the norm of a local cost for each segment as an integrated cost function for a segment sequence to consider both the degradation of naturalness over the entire synthetic speech and the local degradation. The cost function is optimized by adjusting not only the power coefficient of the norm but also weights for sub-costs so that the integrated cost corresponds better to perceptual scores determined by perceptual experiments. As a result, it is clarified that the correspondence of the cost can be improved to a greater degree by optimizing both the weights and the power coefficient than by optimizing either the weights or the power coefficient. However, it is also clarified that the correspondence is insufficient after optimizing the integrated cost function.

Full Paper

Bibliographic reference.  Toda, Tomoki / Kawai, Hisashi / Tsuzaki, Minoru (2003): "Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations", In EUROSPEECH-2003, 297-300.