12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM

Takashi Nose, Takao Kobayashi

Tokyo Institute of Technology, Japan

This paper describes a technique for modeling and controlling emotional expressivity of speech in HMM-based speech synthesis. A problem of conventional emotional speech synthesis based on HMM is that the intensity of an emotional expression appearing in synthetic speech completely depends on the database used for model training. To take into account the emotional expressivity that listeners actually perceive, the perceptual expressivity scores are introduced into a style control technique based on multipleregression hidden semi-Markov model (MRHSMM). The objective and subjective evaluation results show that the proposed technique works well when there is a large bias of emotional expressivity in the training data.

Full Paper

Bibliographic reference.  Nose, Takashi / Kobayashi, Takao (2011): "A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM", In INTERSPEECH-2011, 109-112.