Sixth International Conference on Spoken Language Processing
This paper proposes a method for designing a sentence set for utterances taking account of prosody. This method is based on a measure of coverage which incorporates two factors: (1) the distribution of voice fundamental frequency and phoneme duration predicted by the prosody generation module of a TTS; (2) perceptual damage to naturalness due to prosody modification. A set of 500 sentences with a predicted coverage of 82.6% was designed by this method, and used to collect a speech corpus. The obtained speech corpus yielded 88% of the predicted coverage. The data size was reduced to 49% in terms of number of sentences (89% in terms of number of phonemes) compared to a general-purpose corpus designed without taking prosody into account.
Bibliographic reference. Kawai, Hisashi / Yamamoto, Seiichi / Higuchi, Norio / Shimizu, Tohru (2000): "A design method of speech corpus for text-to-speech synthesis taking account of prosody", In ICSLP-2000, vol.3, 420-425.