This paper proposes a method for designing a sentence set for utterances taking account of prosody. This method is based on a measure of coverage which incorporates two factors: (1) the distribution of voice fundamental frequency and phoneme duration predicted by the prosody generation module of a TTS; (2) perceptual damage to naturalness due to prosody modification. A set of 500 sentences with a predicted coverage of 82.6% was designed by this method, and used to collect a speech corpus. The obtained speech corpus yielded 88% of the predicted coverage. The data size was reduced to 49% in terms of number of sentences (89% in terms of number of phonemes) compared to a general-purpose corpus designed without taking prosody into account.
Cite as: Kawai, H., Yamamoto, S., Higuchi, N., Shimizu, T. (2000) A design method of speech corpus for text-to-speech synthesis taking account of prosody. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 420-425
@inproceedings{kawai00_icslp, author={Hisashi Kawai and Seiichi Yamamoto and Norio Higuchi and Tohru Shimizu}, title={{A design method of speech corpus for text-to-speech synthesis taking account of prosody}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 3, 420-425} }