ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis

Tomohiro Nagata, Hiroki Mori, Takashi Nose

This paper describes spontaneous dialogue speech synthesis based on multiple-regression hidden semi-Markov model (MRHSMM), which enables users to specify paralinguistic information of synthesized speech with a dimensional representation. Paralinguistic aspects of synthesized speech are controlled by multiple regression models whose explanatory variables are abstract dimensions such as pleasant-unpleasant and aroused-sleepy. For robust estimation of the regression matrices of the MRHSMM with unbalanced spontaneous dialogue speech samples, the re-estimation formulae were derived in the framework of the maximum a posteriori (MAP) estimation. The result of a perceptual experiment confirmed that the naturalness of synthesized speech was improved by applying the MAP estimation for regression matrices. In addition a high correlation (R &# 3; 0.7) was observed between given and perceived paralinguistic information, which implies that the proposed method could successfully reflect intended paralinguistic messages on the synthesized speech.


doi: 10.21437/Interspeech.2013-392

Cite as: Nagata, T., Mori, H., Nose, T. (2013) Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis. Proc. Interspeech 2013, 1549-1553, doi: 10.21437/Interspeech.2013-392

@inproceedings{nagata13_interspeech,
  author={Tomohiro Nagata and Hiroki Mori and Takashi Nose},
  title={{Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1549--1553},
  doi={10.21437/Interspeech.2013-392}
}