A concatenative speech synthesis system increases its potential to generate natural speech if the system uses more short speech segments, since the concatenation variation becomes greater. In this paper, we propose the use of very short speech segments (5 ms, one pitch period of 200 Hz pitch) for concatenative speech synthesis. The proposed method is applied to the speech database CMU ARCTIC, and 100 sentences synthesized. Though the synthesized speech maintains the speakers identity and is natural enough, it also has some noises caused by inappropriate unit selection, and the formant changes are awkward in some vowel regions.
Cite as: Hirai, T., Tenpaku, S. (2004) Using 5 ms segments in concatenative speech synthesis. Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5), 37-42
@inproceedings{hirai04_ssw, author={Toshio Hirai and Seiichi Tenpaku}, title={{Using 5 ms segments in concatenative speech synthesis}}, year=2004, booktitle={Proc. 5th ISCA Workshop on Speech Synthesis (SSW 5)}, pages={37--42} }