Fifth ISCA ITRW on Speech Synthesis

June 14-16, 2004
Pittsburgh, PA, USA

XIMERA: A New TTS from ATR Based on Corpus-Based Technologies

Hisashi Kawai (1), Tomoki Toda (1,2), Jinfu Ni (1), Minoru Tsuzaki (1), Keiichi Tokuda (1,2)

(1) ATR Spoken Language Translation Research Laboratories (ATR-SLT), Japan
(2) Graduate School of Engineering, Nagoya Institute of Technology, Japan

This paper describes a new concatenative TTS system under development at ATR. The system, named XIMERA, is based on corpus-based technologies, as was the case for the preceding TTS systems from ATR, namely -talk and CHATR. The prominent features of XIMERA are (1) large corpora (a 110-hours corpus of a Japanese male, a 60-hours corpus of a Japanese female, and a 20-hours corpus of a Chinese female), (2) HMM-based generation of prosodic parameters, and (3) a cost function for segment selection optimized based on perceptual experiments. A perception test that evaluated the naturalness of synthetic speech for XIMERA and 10 TTS products, including CHATR, showed that XIMERA outperformed the other ten.

Full Paper

Bibliographic reference.  Kawai, Hisashi / Toda, Tomoki / Ni, Jinfu / Tsuzaki, Minoru / Tokuda, Keiichi (2004): "XIMERA: a new TTS from ATR based on corpus-based technologies", In SSW5-2004, 179-184.