7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Reducing the Footprint of the IBM Trainable Speech Synthesis System

Dan Chazan, Ron Hoory, Zvi Kons, Dorel Silberstein, Alexander Sorin

IBM Haifa labs, Israel

This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech generation. Initial results indicate that even with a dataset size of a few megabytes it is possible to achieve quality which is significantly higher than existing small footprint formant based synthesizers.

Full Paper

Bibliographic reference.  Chazan, Dan / Hoory, Ron / Kons, Zvi / Silberstein, Dorel / Sorin, Alexander (2002): "Reducing the footprint of the IBM trainable speech synthesis system", In ICSLP-2002, 2381-2384.