7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech generation. Initial results indicate that even with a dataset size of a few megabytes it is possible to achieve quality which is significantly higher than existing small footprint formant based synthesizers.
Bibliographic reference. Chazan, Dan / Hoory, Ron / Kons, Zvi / Silberstein, Dorel / Sorin, Alexander (2002): "Reducing the footprint of the IBM trainable speech synthesis system", In ICSLP-2002, 2381-2384.