Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Improving the Naturalness of Synthetic Speech by Utilizing the Prosody of Natural Speech

Toshimitsu Minowa, Ryo Mochizuki, Hirofumi Nishimura

Multimedia Solution Laboratories, Matsushita Communication Industrial Co., Ltd., Japan

The quality of synthetic speech is greatly improved if a prosody of natural speech is adopted instead of a rule based prosody. In order to apply this effect to an arbitrary word synthesis, the authors proposed a new prosody control method. According to the result of a listening test, it was shown that rhythm could be independently controlled from pitch and power whereas pitch and power should be dependently controlled. Therefore, it seems that pitch and power control method should be derived from the same speech. However, in a embedded types of practical arbitrary word synthesis, the amount of memory is so limited that there is little room for redundant data. So, authors systematically derived the prosody (pitch interval and pitch waveform amplitude) of continuously uttered mono syllable speech which cover most Japanese accent types. Syllables are chosen in accordance with the categories of Japanese consonants which are classified by manner and place of articulation. By this method, the naturalness of the synthetic speech achieved almost the same preference score with the one that copied the prosody of natural speech.


Full Paper

Bibliographic reference.  Minowa, Toshimitsu / Mochizuki, Ryo / Nishimura, Hirofumi (2000): "Improving the naturalness of synthetic speech by utilizing the prosody of natural speech", In ICSLP-2000, vol.1, 609-612.