For constructing a speech synthesis system which can achieve diverse voices, we have been developing a speaker independent approach of HMM-based speech synthesis in which statistical average voice models are adapted to a target speaker using a small amount of speech data. In this paper, we incorporate a high-quality speech vocoding method STRAIGHT and a parameter generation algorithm with global variance into the system for improving quality of synthetic speech. Furthermore, we introduce a feature-space speaker adaptive training algorithm and a gender mixed modeling technique for conducting further normalization of the average voice model. We build an English text-to-speech system using these techniques and show the performance of the system.
Cite as: Yamagishi, J., Kobayashi, T., Renals, S., King, S., Zen, H., Toda, T., Tokuda, K. (2007) Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 125-130
@inproceedings{yamagishi07_ssw, author={Junichi Yamagishi and Takao Kobayashi and Steve Renals and Simon King and Heiga Zen and Tomoki Toda and Keiichi Tokuda}, title={{Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV}}, year=2007, booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)}, pages={125--130} }