Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Acoustic Model Training Based on Linear Transformation and MAP Modification for HSMM-Based Speech Synthesis

Katsumi Ogata, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi

Tokyo Institute of Technology, Japan

This paper describes the use of combined linear regression and expost MAP methods for average-voice-based speech synthesis system based on HMM. To generate more natural sounding speech using the average-voice-based speech synthesis system when a large amount of training data is available, we apply ex-post MAP estimation after the linear transformation based adaptation. We investigate how the amount of data used in the training of the average voice model and the tying topology affect the naturalness of synthetic speech. From the results of evaluation tests, we show that the adapted average voice model trained using a large amount of data can generate more natural sounding speech than the speaker dependent model.

Full Paper

Bibliographic reference.  Ogata, Katsumi / Tachibana, Makoto / Yamagishi, Junichi / Kobayashi, Takao (2006): "Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis", In INTERSPEECH-2006, paper 1787-Tue3BuP.9.