12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency

Keikichi Hirose, Keiko Ochi, Ryusuke Mihara, Hiroya Hashimoto, Daisuke Saito, Nobuaki Minematsu

University of Tokyo, Japan

A method was developed to adapt prosody to a new speaker/style in speech synthesis. It is based on predicting differences between target and original speakers/styles and applying them to the original one. Differences in fundamental frequency (F0) contours are represented in the framework of the generation process model; differences in the command magnitudes/amplitudes. While the original one requires a certain amount of training corpus, while corpus for training command differences can be small. Furthermore, in the case of style adaptation, it is not necessarily the corpus being uttered by the same speaker of the original style. Speech synthesis was conducted using HMM-based speech synthesis system, where prosody was controlled by the method. Listening experiments on synthetic speech with style adaptation and voice conversion both showed the validity of the method.

Full Paper

Bibliographic reference.  Hirose, Keikichi / Ochi, Keiko / Mihara, Ryusuke / Hashimoto, Hiroya / Saito, Daisuke / Minematsu, Nobuaki (2011): "Adaptation of prosody in speech synthesis by changing command values of the generation process model of fundamental frequency", In INTERSPEECH-2011, 2793-2796.