This paper describes spontaneous dialogue speech synthesis based on multiple-regression hidden semi-Markov model (MRHSMM), which enables users to specify paralinguistic information of synthesized speech with a dimensional representation. Paralinguistic aspects of synthesized speech are controlled by multiple regression models whose explanatory variables are abstract dimensions such as pleasant-unpleasant and aroused-sleepy. For robust estimation of the regression matrices of the MRHSMM with unbalanced spontaneous dialogue speech samples, the re-estimation formulae were derived in the framework of the maximum a posteriori (MAP) estimation. The result of a perceptual experiment confirmed that the naturalness of synthesized speech was improved by applying the MAP estimation for regression matrices. In addition a high correlation (R 3; 0.7) was observed between given and perceived paralinguistic information, which implies that the proposed method could successfully reflect intended paralinguistic messages on the synthesized speech.
Bibliographic reference. Nagata, Tomohiro / Mori, Hiroki / Nose, Takashi (2013): "Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis", In INTERSPEECH-2013, 1549-1553.