INTERSPEECH 2006 - ICSLP
This paper describes the use of combined linear regression and expost MAP methods for average-voice-based speech synthesis system based on HMM. To generate more natural sounding speech using the average-voice-based speech synthesis system when a large amount of training data is available, we apply ex-post MAP estimation after the linear transformation based adaptation. We investigate how the amount of data used in the training of the average voice model and the tying topology affect the naturalness of synthetic speech. From the results of evaluation tests, we show that the adapted average voice model trained using a large amount of data can generate more natural sounding speech than the speaker dependent model.
Bibliographic reference. Ogata, Katsumi / Tachibana, Makoto / Yamagishi, Junichi / Kobayashi, Takao (2006): "Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis", In INTERSPEECH-2006, paper 1787-Tue3BuP.9.