This paper presents a new approach to synthesizing fast speech in unit selection synthesis. After recording two inventories - one at normal and one at fast speech rate articulated as accurately as possible - speech was synthesized from both corpora independently. Since fast speech differs from normal rate speech in terms of acoustic characteristics, the concept of multi-phone (phoxsy) units [1] was implemented and used to synthesize speech at both speaking rates again. A perceptual evaluation showed that phoxsy units enhanced#the iontelligibility for fast speech synthesis significantly.
index Terms: fast speech, unit selection, phoxsy units
Breuer, S., Abresch, J. "Phoxsy: Multi-phone segments for unit selection speech synthesis. In Interspeech-2004 (ICSLP)
Cite as: Moers, D., Jauk, I., Möbius, B., Wagner, P. (2010) Synthesizing fast speech by implementing multi-phone units in unit selection speech synthesis. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 355-358
@inproceedings{moers10_ssw, author={Donata Moers and Igor Jauk and Bernd Möbius and Petra Wagner}, title={{Synthesizing fast speech by implementing multi-phone units in unit selection speech synthesis}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={355--358} }