INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Feedback Loop for Prosody Prediction in Concatenative Speech Synthesis

Javier Latorre (1), Sergio Gracia (2), Masami Akamine (1)

(1) Toshiba Corporate R&D Center, Japan
(2) Universitat Politècnica de Catalunya, Spain

We propose a method for concatenative speech synthesis that permits to obtain a better matching between the logF0 and duration predicted by the prosody module and the waveform generation back-end. The proposed method is based upon our previous multilevel parametric F0 model and Toshiba’s plural unit selection and fusion synthesizer. The method adds a feedback loop from the back-end into the prosody module so that the prosodical information of the selected units is used to re-estimate new prosody values. The feedback loop defines a frame-level prosody model which consists of the average value and variance of the duration and logF0 of the selected units. The log-likelihood defined by this model is added to the log-likelihood of the prosody model. From the maximization of this total log-likelihood, we obtain the prosody values that produce the optimum compromise between the distortion introduced by F0 discontinuities and the one created by the prosody adjusting signal processing.

Full Paper

Bibliographic reference.  Latorre, Javier / Gracia, Sergio / Akamine, Masami (2009): "Feedback loop for prosody prediction in concatenative speech synthesis", In INTERSPEECH-2009, 2067-2070.