14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Robust Estimation of Multiple-Regression HMM Parameters for Dimension-Based Expressive Dialogue Speech Synthesis

Tomohiro Nagata (1), Hiroki Mori (1), Takashi Nose (2)

(1) Utsunomiya University, Japan
(2) Tokyo Institute of Technology, Japan

This paper describes spontaneous dialogue speech synthesis based on multiple-regression hidden semi-Markov model (MRHSMM), which enables users to specify paralinguistic information of synthesized speech with a dimensional representation. Paralinguistic aspects of synthesized speech are controlled by multiple regression models whose explanatory variables are abstract dimensions such as pleasant-unpleasant and aroused-sleepy. For robust estimation of the regression matrices of the MRHSMM with unbalanced spontaneous dialogue speech samples, the re-estimation formulae were derived in the framework of the maximum a posteriori (MAP) estimation. The result of a perceptual experiment confirmed that the naturalness of synthesized speech was improved by applying the MAP estimation for regression matrices. In addition a high correlation (R &# 3; 0.7) was observed between given and perceived paralinguistic information, which implies that the proposed method could successfully reflect intended paralinguistic messages on the synthesized speech.

Full Paper

Bibliographic reference.  Nagata, Tomohiro / Mori, Hiroki / Nose, Takashi (2013): "Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis", In INTERSPEECH-2013, 1549-1553.