Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Corpus-Based Generation of Fundamental Frequency Contours Using Generation Process Model and Considering Emotional Focuses

Keikichi Hirose, Yasufumi Asano, Nobuaki Minematsu

University of Tokyo, Japan

We formerly conducted emotional speech synthesis using our corpusbased method of generating fundamental frequency (F0) contours from text. The method predicts command values of F0 contour generation process model instead of directly predicting F0 value of each time frame. A better control of F0 contours was realized by taking the emotional level of each bunsetsu into account: adding information on which bunsetsu(s) the emotion is especially placed to the command predictor inputs. In the case of anger, F0 contours closer to the target contours are obtained by adding emotional levels. Speech synthesis was conducted by generating F0 contours in two ways: using commands predicted by taking emotional levels into account and those not. The result of perceptual experiment indicated that emotion was conveyed well by adding emotional levels.

Full Paper

Bibliographic reference.  Hirose, Keikichi / Asano, Yasufumi / Minematsu, Nobuaki (2006): "Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses", In INTERSPEECH-2006, paper 1902-Mon2A3O.3.