ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

HMM-based speaker characteristics emphasis using average voice model

Takashi Nose, Junichi Adada, Takao Kobayashi

This paper presents a technique for controlling and emphasizing speaker characteristics of synthetic speech. The key idea comes from the way of imitating voice by professional impersonators. In the voice imitation, impersonators effectively utilize exaggeration of a target speaker’s voice characteristics. To model and control the degree of speaker characteristics, we use a speech synthesis framework based on multiple-regression hidden semi-Markov model (MRHSMM). In MRHSMM, mean parameters are given by multiple regression of a low-dimensional control vector. The control vector represents how much the target speaker’s model parameters are different from those of the average voice model. By changing the control vector in speech synthesis, we can control the degree of voice characteristics of the target speaker. Results of subjective experiments show that the speaker reproducibility of synthetic speech is improved by emphasizing speaker characteristics.

doi: 10.21437/Interspeech.2009-492

Cite as: Nose, T., Adada, J., Kobayashi, T. (2009) HMM-based speaker characteristics emphasis using average voice model. Proc. Interspeech 2009, 2631-2634, doi: 10.21437/Interspeech.2009-492

  author={Takashi Nose and Junichi Adada and Takao Kobayashi},
  title={{HMM-based speaker characteristics emphasis using average voice model}},
  booktitle={Proc. Interspeech 2009},