Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Model Adaptation and Adaptive Training Using ESAT Algorithm for HMM-Based Speech Synthesis

Juri Isogai, Junichi Yamagishi, Takao Kobayashi

Tokyo Institute of Technology, Japan

In speaker adaptation for HMM-based speech synthesis, model adaptation and adaptive training techniques play key roles. For reducing dependency on an initial model and adapting the model to wide-ranging target speakers, we propose speaker adaptation and adaptive training algorithms based on ESAT algorithm for HMM-based speech synthesis. The ESAT algorithm estimates contributing rate of several given initial models and combines them depending on likelihood of adaptation data for the target speaker. In this study, we incorporate the ESAT algorithm into a framework of hidden semi-Markov model (HSMM) to adapt both state output and duration distributions and convert both voice characteristics and prosodic features. From the results of subjective tests, we show that the ESAT algorithm lessen the dependence of synthetic speech quality on the initial model and has the potential ability for a wider range of the target speakers.

Full Paper

Bibliographic reference.  Isogai, Juri / Yamagishi, Junichi / Kobayashi, Takao (2005): "Model adaptation and adaptive training using ESAT algorithm for HMM-based speech synthesis", In INTERSPEECH-2005, 2597-2600.