INTERSPEECH 2004 - ICSLP
8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Perception-Guided and Phonetic Clustering Weight Tuning Based on Diphone Pairs for Unit Selection TTS

Francesc Alias (1), Xavier Llora (2), Ignasi Iriondo (1), Joan Claudi Socoro (1), Xavier Sevillano (1), Lluis Formiga (1)

(1) Ramon Llull University, Spain
(2) University of Illinois at Urbana-Champaign, USA

The quality of corpus based text-to-speech systems depends on the accuracy of the unit selection process, which relies on the values of the weights of the cost function. This paper is focused on defining a new framework for the tuning of these weights. We propose a technique for taking into account the subjective perception of speech in the selection process by means of Interactive Genetic Algorithms. Moreover, we introduce a CART-based method for unit clustering. Both techniques are applied to weight tuning based on diphone pairs. The conducted experiments analyze the feasibility of both proposals separately.

Full Paper

Bibliographic reference.  Alias, Francesc / Llora, Xavier / Iriondo, Ignasi / Socoro, Joan Claudi / Sevillano, Xavier / Formiga, Lluis (2004): "Perception-guided and phonetic clustering weight tuning based on diphone pairs for unit selection TTS", In INTERSPEECH-2004, 1221-1224.