Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

Using 5 ms Segments in Concatenative Speech Synthesis

Toshio Hirai, Seiichi Tenpaku

Arcadia Inc., Minoh, Osaka, Japan

A concatenative speech synthesis system increases its potential to generate natural speech if the system uses more short speech segments, since the concatenation variation becomes greater. In this paper, we propose the use of very short speech segments (5 ms, one pitch period of 200 Hz pitch) for concatenative speech synthesis. The proposed method is applied to the speech database CMU ARCTIC, and 100 sentences synthesized. Though the synthesized speech maintains the speakerís identity and is natural enough, it also has some noises caused by inappropriate unit selection, and the formant changes are awkward in some vowel regions.

