The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

Substitution of State Distributions to Reproduce Natural Prosody on HMM-Based Speech Synthesizers

Nobuyuki Nishizawa, Tsuneo Kato

KDDI R&D Laboratories Inc., Japan

An extension of HMM-based speech synthesis is proposed to reproduce natural speech sounds. For compression of large amounts of speech, the use of speech synthesizers has an advantage in terms of the size of compressed data. However, the quality of synthetic speech is often inferior to that of speech compressed by general-purpose speech codecs such as CELP, where prosodic features are reproduced more accurately. Therefore, we propose adding complementary information to reproduce natural prosody. In the proposed method, inappropriate state feature vectors of HMMs determined by the conventional speech synthesis method are substituted by other vectors bound to the decision trees. The experimental results indicated that substitution of 20% of state feature vectors reduces root mean squared error (RMSE) in log F0 to 0.3 semitones, which is approximately 15% of RMSE without substitution.

Index Terms: HMM-based speech synthesis, vector substitution, speech data compression

Bibliographic reference.  Nishizawa, Nobuyuki / Kato, Tsuneo (2010): "Substitution of state distributions to reproduce natural prosody on HMM-based speech synthesizers", In SSW7-2010, 167-172.