Interspeech'2005 - Eurospeech
In this paper we present a method for speech modeling and its utilization in IBM's small footprint concatenative text-to-speech system. The method is based on frequency-domain, complex spectral envelope modeling, where the phase component plays a crucial role in attaining high quality speech synthesis. The modeling scheme presented enables low bit rate compression of the amplitude and phase information and low-complexity reconstruction of high quality speech with wide range pitch modification. Listening tests conducted for the overall text-to-speech system show a major improvement in MOS, compared to a previous, MFCC-based, system.
Bibliographic reference. Chazan, Dan / Hoory, Ron / Kons, Zvi / Sagi, Ariel / Shechtman, Slava / Sorin, Alexander (2005): "Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling", In INTERSPEECH-2005, 2569-2572.