A method was developed to adapt prosody to a new speaker/style in speech synthesis. It is based on predicting differences between target and original speakers/styles and applying them to the original one. Differences in fundamental frequency (F0) contours are represented in the framework of the generation process model; differences in the command magnitudes/amplitudes. While the original one requires a certain amount of training corpus, while corpus for training command differences can be small. Furthermore, in the case of style adaptation, it is not necessarily the corpus being uttered by the same speaker of the original style. Speech synthesis was conducted using HMM-based speech synthesis system, where prosody was controlled by the method. Listening experiments on synthetic speech with style adaptation and voice conversion both showed the validity of the method.
Bibliographic reference. Hirose, Keikichi / Ochi, Keiko / Mihara, Ryusuke / Hashimoto, Hiroya / Saito, Daisuke / Minematsu, Nobuaki (2011): "Adaptation of prosody in speech synthesis by changing command values of the generation process model of fundamental frequency", In INTERSPEECH-2011, 2793-2796.