Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Small Footprint Concatenative Text-to-Speech Synthesis System Using Complex Spectral Envelope Modeling

Dan Chazan, Ron Hoory, Zvi Kons, Ariel Sagi, Slava Shechtman, Alexander Sorin

IBM Haifa Labs, Israel

In this paper we present a method for speech modeling and its utilization in IBM's small footprint concatenative text-to-speech system. The method is based on frequency-domain, complex spectral envelope modeling, where the phase component plays a crucial role in attaining high quality speech synthesis. The modeling scheme presented enables low bit rate compression of the amplitude and phase information and low-complexity reconstruction of high quality speech with wide range pitch modification. Listening tests conducted for the overall text-to-speech system show a major improvement in MOS, compared to a previous, MFCC-based, system.

Full Paper

Bibliographic reference.  Chazan, Dan / Hoory, Ron / Kons, Zvi / Sagi, Ariel / Shechtman, Slava / Sorin, Alexander (2005): "Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling", In INTERSPEECH-2005, 2569-2572.