In speaker adaptation for HMM-based speech synthesis, model adaptation and adaptive training techniques play key roles. For reducing dependency on an initial model and adapting the model to wide-ranging target speakers, we propose speaker adaptation and adaptive training algorithms based on ESAT algorithm for HMM-based speech synthesis. The ESAT algorithm estimates contributing rate of several given initial models and combines them depending on likelihood of adaptation data for the target speaker. In this study, we incorporate the ESAT algorithm into a framework of hidden semi-Markov model (HSMM) to adapt both state output and duration distributions and convert both voice characteristics and prosodic features. From the results of subjective tests, we show that the ESAT algorithm lessen the dependence of synthetic speech quality on the initial model and has the potential ability for a wider range of the target speakers.
Cite as: Isogai, J., Yamagishi, J., Kobayashi, T. (2005) Model adaptation and adaptive training using ESAT algorithm for HMM-based speech synthesis. Proc. Interspeech 2005, 2597-2600, doi: 10.21437/Interspeech.2005-804
@inproceedings{isogai05_interspeech, author={Juri Isogai and Junichi Yamagishi and Takao Kobayashi}, title={{Model adaptation and adaptive training using ESAT algorithm for HMM-based speech synthesis}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2597--2600}, doi={10.21437/Interspeech.2005-804} }