Speech Prosody 2008

Campinas, Brazil
May 6-9, 2008

A New Clustering Approach for JEMA

Pablo Daniel AgŁero (1), Juan Carlos Tulli (1), Antonio Bonafonte (2)

(1) Communications Lab, University of Mar del Plata, Argentina
(2) TALP Research Center, Universitat PolitŤcnica de Catalunya, Spain

This paper focuses on the training process of intonation models for text-to-speech synthesis. In previous papers we concentrated on two key points of intonation modelling: interpolation of fundamental frequency contour in unvoiced segments and sentence-by-sentence parameter extraction. We proposed an alternative approach for model training named JEMA (Joint Extraction and Modeling Approach) using CART. Here we propose a new alternative to obtain the mapping function that relates the linguistic features available in TTS and the fundamental frequency contour space. A clustering algorithm using a distance measure over a variable feature vector dimension space is used to partition the space of fundamental frequency contours in the training data. In this way we seek for important groups of features with specific values that explain the shape of fundamental frequency contours. The proposed technique shows improvements in the experimental results over CART.

Full Paper

Bibliographic reference.  AgŁero, Pablo Daniel / Tulli, Juan Carlos / Bonafonte, Antonio (2008): "A new clustering approach for JEMA", In SP-2008, 83-86.