An extension of HMM-based speech synthesis is proposed to reproduce natural speech sounds. For compression of large amounts of speech, the use of speech synthesizers has an advantage in terms of the size of compressed data. However, the quality of synthetic speech is often inferior to that of speech compressed by general-purpose speech codecs such as CELP, where prosodic features are reproduced more accurately. Therefore, we propose adding complementary information to reproduce natural prosody. In the proposed method, inappropriate state feature vectors of HMMs determined by the conventional speech synthesis method are substituted by other vectors bound to the decision trees. The experimental results indicated that substitution of 20% of state feature vectors reduces root mean squared error (RMSE) in log F0 to 0.3 semitones, which is approximately 15% of RMSE without substitution.
Index Terms: HMM-based speech synthesis, vector substitution, speech data compression
Cite as: Nishizawa, N., Kato, T. (2010) Substitution of state distributions to reproduce natural prosody on HMM-based speech synthesizers. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 167-172
@inproceedings{nishizawa10_ssw, author={Nobuyuki Nishizawa and Tsuneo Kato}, title={{Substitution of state distributions to reproduce natural prosody on HMM-based speech synthesizers}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={167--172} }