ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling

Dan Chazan, Ron Hoory, Zvi Kons, Ariel Sagi, Slava Shechtman, Alexander Sorin

In this paper we present a method for speech modeling and its utilization in IBM's small footprint concatenative text-to-speech system. The method is based on frequency-domain, complex spectral envelope modeling, where the phase component plays a crucial role in attaining high quality speech synthesis. The modeling scheme presented enables low bit rate compression of the amplitude and phase information and low-complexity reconstruction of high quality speech with wide range pitch modification. Listening tests conducted for the overall text-to-speech system show a major improvement in MOS, compared to a previous, MFCC-based, system.


doi: 10.21437/Interspeech.2005-797

Cite as: Chazan, D., Hoory, R., Kons, Z., Sagi, A., Shechtman, S., Sorin, A. (2005) Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling. Proc. Interspeech 2005, 2569-2572, doi: 10.21437/Interspeech.2005-797

@inproceedings{chazan05_interspeech,
  author={Dan Chazan and Ron Hoory and Zvi Kons and Ariel Sagi and Slava Shechtman and Alexander Sorin},
  title={{Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2569--2572},
  doi={10.21437/Interspeech.2005-797}
}