10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

HMM-Based Speaker Characteristics Emphasis Using Average Voice Model

Takashi Nose, Junichi Adada, Takao Kobayashi

Tokyo Institute of Technology, Japan

This paper presents a technique for controlling and emphasizing speaker characteristics of synthetic speech. The key idea comes from the way of imitating voice by professional impersonators. In the voice imitation, impersonators effectively utilize exaggeration of a target speakerís voice characteristics. To model and control the degree of speaker characteristics, we use a speech synthesis framework based on multiple-regression hidden semi-Markov model (MRHSMM). In MRHSMM, mean parameters are given by multiple regression of a low-dimensional control vector. The control vector represents how much the target speakerís model parameters are different from those of the average voice model. By changing the control vector in speech synthesis, we can control the degree of voice characteristics of the target speaker. Results of subjective experiments show that the speaker reproducibility of synthetic speech is improved by emphasizing speaker characteristics.

