Fifth ISCA ITRW on Speech Synthesis
June 14-16, 2004
This paper describes a new concatenative TTS system under development at ATR. The system, named XIMERA, is based on corpus-based technologies, as was the case for the preceding TTS systems from ATR, namely í-talk and CHATR. The prominent features of XIMERA are (1) large corpora (a 110-hours corpus of a Japanese male, a 60-hours corpus of a Japanese female, and a 20-hours corpus of a Chinese female), (2) HMM-based generation of prosodic parameters, and (3) a cost function for segment selection optimized based on perceptual experiments. A perception test that evaluated the naturalness of synthetic speech for XIMERA and 10 TTS products, including CHATR, showed that XIMERA outperformed the other ten.
Bibliographic reference. Kawai, Hisashi / Toda, Tomoki / Ni, Jinfu / Tsuzaki, Minoru / Tokuda, Keiichi (2004): "XIMERA: a new TTS from ATR based on corpus-based technologies", In SSW5-2004, 179-184.