Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Concatenative Text-to-Speech Synthesis Based on Prototype Waveform Interpolation (A Time Frequency Approach)

Edmilson S. Morais (1), Paul Taylor (1), Fábio Violaro (2)

(1) CSTR / University of Edinburgh, UK
(2) Department of Eletrical Engineering, Universidade Estadual de Campinas, Brazil

This paper presents some preliminary methods to apply the Time- Frequency Interpolation technique - TFI [3] to concatenative text-to-speech synthesis. The TFI technique described here is a pitch-synchronous time-frequency approach of the well known Prototype-Waveform Interpolation technique - PWI [2]. The basic concepts of representing the speech signal in the Time-Frequency Domain as well as techniques to perform Time-Scale and Pitch- Scale modifications are described. Using the flexibility of TFI technique to perform spectral smothing, a method was developed to minimize the spectral mismatch at the boundaries of the Synthesis-Units - SUs. The proposed system was evaluated using SUs (Diphones) and prosodic modifications generated by the Festival system [1]. An informal subjective test was performed, between the proposed TFI system and the standard TD-PSOLA system, highligthing the superior quality of the proposed system in comparasion with TD-PSOLA.


  1. A. Black, P. Taylor, R. Caley. The Festival Speech Synthesis. Avaliable at, 4(5), Sept. 1996.
  2. B. Kleijn, K. Paliwal, eds. Speech Coding and Synthesis. Elsevier, Amsterdam, 1998.
  3. Y. Shoham. High-quality Speech Coding at 2.4 to 4.0 kbps Based on Time-Frequency Intepolation. IEEE Proc. ICASSP ‘93, II.167-170, April, 1993.

Bibliographic reference.  Morais, Edmilson S. / Taylor, Paul / Violaro, Fábio (2000): "Concatenative text-to-speech synthesis based on prototype waveform interpolation (a time frequency approach)", In ICSLP-2000, vol.2, 387-390.