Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

A Design Method of Speech Corpus for Text-To-Speech Synthesis Taking Account of Prosody

Hisashi Kawai (1,3), Seiichi Yamamoto (1), Norio Higuchi (2), Tohru Shimizu (2)

(1) ATR Spoken Language Translation Research Laboratories, Seika-cho, Soraku-gun, Kyoto, Japan
(2) KDD R&D Laboratories Inc., Kamifukuoka-shi, Saitama, Japan
(3)Hisashi Kawai was at KDD R&D Laboratories at the time of this research.

This paper proposes a method for designing a sentence set for utterances taking account of prosody. This method is based on a measure of coverage which incorporates two factors: (1) the distribution of voice fundamental frequency and phoneme duration predicted by the prosody generation module of a TTS; (2) perceptual damage to naturalness due to prosody modification. A set of 500 sentences with a predicted coverage of 82.6% was designed by this method, and used to collect a speech corpus. The obtained speech corpus yielded 88% of the predicted coverage. The data size was reduced to 49% in terms of number of sentences (89% in terms of number of phonemes) compared to a general-purpose corpus designed without taking prosody into account.

