ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Improving the naturalness of synthetic speech by utilizing the prosody of natural speech

Toshimitsu Minowa, Ryo Mochizuki, Hirofumi Nishimura

The quality of synthetic speech is greatly improved if a prosody of natural speech is adopted instead of a rule based prosody. In order to apply this effect to an arbitrary word synthesis, the authors proposed a new prosody control method. According to the result of a listening test, it was shown that rhythm could be independently controlled from pitch and power whereas pitch and power should be dependently controlled. Therefore, it seems that pitch and power control method should be derived from the same speech. However, in a embedded types of practical arbitrary word synthesis, the amount of memory is so limited that there is little room for redundant data. So, authors systematically derived the prosody (pitch interval and pitch waveform amplitude) of continuously uttered mono syllable speech which cover most Japanese accent types. Syllables are chosen in accordance with the categories of Japanese consonants which are classified by manner and place of articulation. By this method, the naturalness of the synthetic speech achieved almost the same preference score with the one that copied the prosody of natural speech.


Cite as: Minowa, T., Mochizuki, R., Nishimura, H. (2000) Improving the naturalness of synthetic speech by utilizing the prosody of natural speech. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 609-612

@inproceedings{minowa00_icslp,
  author={Toshimitsu Minowa and Ryo Mochizuki and Hirofumi Nishimura},
  title={{Improving the naturalness of synthetic speech by utilizing the prosody of natural speech}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 609-612}
}