Speech Prosody 2008
This paper focuses on the training process of intonation models for text-to-speech synthesis. In previous papers we concentrated on two key points of intonation modelling: interpolation of fundamental frequency contour in unvoiced segments and sentence-by-sentence parameter extraction. We proposed an alternative approach for model training named JEMA (Joint Extraction and Modeling Approach) using CART. Here we propose a new alternative to obtain the mapping function that relates the linguistic features available in TTS and the fundamental frequency contour space. A clustering algorithm using a distance measure over a variable feature vector dimension space is used to partition the space of fundamental frequency contours in the training data. In this way we seek for important groups of features with specific values that explain the shape of fundamental frequency contours. The proposed technique shows improvements in the experimental results over CART.
Bibliographic reference. AgŁero, Pablo Daniel / Tulli, Juan Carlos / Bonafonte, Antonio (2008): "A new clustering approach for JEMA", In SP-2008, 83-86.