9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

An Estimation Technique of Style Expressiveness for Emotional Speech Using Model Adaptation Based on Multiple-Regression HSMM

Takashi Nose, Yoichi Kato, Makoto Tachibana, Takao Kobayashi

Tokyo Institute of Technology, Japan

This paper describes a technique of estimating style expressiveness for an arbitrary speaker's emotional speech. In the proposed technique, the style expressiveness, representing how much the emotions and/or speaking styles affect the acoustic features, is estimated based on multiple-regression hidden semi-Markov model (MRHSMM). In the model training, we first train average voice model using multiple speakers' neutral style speech. Then, the speakerand style-adapted HSMMs are obtained based on linear transformation from the average voice model with a small amount of the target speaker's data. Finally, MRHSMM of the target speaker is obtained using the adapted models. For given input emotional speech, the style expressiveness is estimated based on maximum likelihood criterion. From the experimental results, we show that the estimated value gives good correspondence to the perceptual rating.

Full Paper

Bibliographic reference.  Nose, Takashi / Kato, Yoichi / Tachibana, Makoto / Kobayashi, Takao (2008): "An estimation technique of style expressiveness for emotional speech using model adaptation based on multiple-regression HSMM", In INTERSPEECH-2008, 2759-2762.