ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

A design method of speech corpus for text-to-speech synthesis taking account of prosody

Hisashi Kawai, Seiichi Yamamoto, Norio Higuchi, Tohru Shimizu

This paper proposes a method for designing a sentence set for utterances taking account of prosody. This method is based on a measure of coverage which incorporates two factors: (1) the distribution of voice fundamental frequency and phoneme duration predicted by the prosody generation module of a TTS; (2) perceptual damage to naturalness due to prosody modification. A set of 500 sentences with a predicted coverage of 82.6% was designed by this method, and used to collect a speech corpus. The obtained speech corpus yielded 88% of the predicted coverage. The data size was reduced to 49% in terms of number of sentences (89% in terms of number of phonemes) compared to a general-purpose corpus designed without taking prosody into account.


Cite as: Kawai, H., Yamamoto, S., Higuchi, N., Shimizu, T. (2000) A design method of speech corpus for text-to-speech synthesis taking account of prosody. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 420-425

@inproceedings{kawai00_icslp,
  author={Hisashi Kawai and Seiichi Yamamoto and Norio Higuchi and Tohru Shimizu},
  title={{A design method of speech corpus for text-to-speech synthesis taking account of prosody}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 420-425}
}