This paper presents a technique for controlling and emphasizing speaker characteristics of synthetic speech. The key idea comes from the way of imitating voice by professional impersonators. In the voice imitation, impersonators effectively utilize exaggeration of a target speakers voice characteristics. To model and control the degree of speaker characteristics, we use a speech synthesis framework based on multiple-regression hidden semi-Markov model (MRHSMM). In MRHSMM, mean parameters are given by multiple regression of a low-dimensional control vector. The control vector represents how much the target speakers model parameters are different from those of the average voice model. By changing the control vector in speech synthesis, we can control the degree of voice characteristics of the target speaker. Results of subjective experiments show that the speaker reproducibility of synthetic speech is improved by emphasizing speaker characteristics.
Cite as: Nose, T., Adada, J., Kobayashi, T. (2009) HMM-based speaker characteristics emphasis using average voice model. Proc. Interspeech 2009, 2631-2634, doi: 10.21437/Interspeech.2009-492
@inproceedings{nose09_interspeech, author={Takashi Nose and Junichi Adada and Takao Kobayashi}, title={{HMM-based speaker characteristics emphasis using average voice model}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={2631--2634}, doi={10.21437/Interspeech.2009-492} }