INTERSPEECH 2004 - ICSLP
The quality of corpus based text-to-speech systems depends on the accuracy of the unit selection process, which relies on the values of the weights of the cost function. This paper is focused on defining a new framework for the tuning of these weights. We propose a technique for taking into account the subjective perception of speech in the selection process by means of Interactive Genetic Algorithms. Moreover, we introduce a CART-based method for unit clustering. Both techniques are applied to weight tuning based on diphone pairs. The conducted experiments analyze the feasibility of both proposals separately.
Bibliographic reference. Alias, Francesc / Llora, Xavier / Iriondo, Ignasi / Socoro, Joan Claudi / Sevillano, Xavier / Formiga, Lluis (2004): "Perception-guided and phonetic clustering weight tuning based on diphone pairs for unit selection TTS", In INTERSPEECH-2004, 1221-1224.