Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

An Enhanced ABS/OLA Sinusoidal Model for Waveform Synthesis in TTS

Michael W. Macon (1), Mark A. Clements (2)

(1) Dept. of ECE, Oregon Graduate Institute, Portland, OR, USA
(2) Dept. of ECE, Georgia Institute of Technology, Atlanta, GA, USA

This paper describes a method for text-to-speech waveform synthesis based on the Analysis-by-Synthesis/Overlap-Add (ABS/OLA) sinusoidal model. This model has been shown in previous work to be a useful framework for pitch and time-scale modification of both speech and music signals. This paper explores some extensions of the original ABS/OLA formulation that attempt to overcome specific artifacts, including a phase dithering approach for unvoiced speech synthesis and an improved pitch modification method that compensates for undesirable energy modulation effects. The implementation of the model within a text-to-speech synthesis (TTS) system is described, and the results of a listener evaluation of the method are discussed.

Full Paper (PDF)   Gnu-Zipped Postscript

Acoustic Example #1
Acoustic Example #2

Bibliographic reference.  Macon, Michael W. / Clements, Mark A. (1999): "An enhanced ABS/OLA sinusoidal model for waveform synthesis in TTS", In EUROSPEECH'99, 2327-2330.