ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

An improved minimum generation error based model adaptation for HMM-based speech synthesis

Yi-Jian Wu, Long Qin, Keiichi Tokuda

A minimum generation error (MGE) criterion had been proposed for model training in HMM-based speech synthesis. In this paper, we apply the MGE criterion to model adaptation for HMM-based speech synthesis, and introduce an MGE linear regression (MGELR) based model adaptation algorithm, where the regression matrices used to transform source models are optimized so as to minimize the generation errors of adaptation data. In addition, we incorporate the recent improvements of MGE criterion into MGELR-based model adaptation, including state alignment under MGE criterion and using a log spectral distortion (LSD) instead of Euclidean distance for spectral distortion measure. From the experimental results, the adaptation performance was improved after incorporating these two techniques, and the formal listening tests showed that the quality and speaker similarity of synthesized speech after MGELR-based adaptation were significantly improved over the original MLLR-based adaptation.


doi: 10.21437/Interspeech.2009-150

Cite as: Wu, Y.-J., Qin, L., Tokuda, K. (2009) An improved minimum generation error based model adaptation for HMM-based speech synthesis. Proc. Interspeech 2009, 1787-1790, doi: 10.21437/Interspeech.2009-150

@inproceedings{wu09_interspeech,
  author={Yi-Jian Wu and Long Qin and Keiichi Tokuda},
  title={{An improved minimum generation error based model adaptation for HMM-based speech synthesis}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1787--1790},
  doi={10.21437/Interspeech.2009-150}
}