Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Concatenative Text-to-Speech Synthesis Based on Prototype Waveform Interpolation (A Time Frequency Approach)
Edmilson S. Morais (1), Paul Taylor (1), Fábio Violaro (2)
(1) CSTR / University of Edinburgh, UK
This paper presents some preliminary methods to apply the Time-
Frequency Interpolation technique - TFI  to concatenative
text-to-speech synthesis. The TFI technique described here is a
pitch-synchronous time-frequency approach of the well known
Prototype-Waveform Interpolation technique - PWI . The basic
concepts of representing the speech signal in the Time-Frequency
Domain as well as techniques to perform Time-Scale and Pitch-
Scale modifications are described. Using the flexibility of TFI
technique to perform spectral smothing, a method was developed
to minimize the spectral mismatch at the boundaries of the
Synthesis-Units - SUs. The proposed system was evaluated using
SUs (Diphones) and prosodic modifications generated by the
Festival system . An informal subjective test was performed,
between the proposed TFI system and the standard TD-PSOLA
system, highligthing the superior quality of the proposed system
in comparasion with TD-PSOLA.
(2) Department of Eletrical Engineering,
Universidade Estadual de Campinas, Brazil
- A. Black, P. Taylor, R. Caley. The Festival Speech Synthesis.
Avaliable at http://www.cstr.ed.ac.uk/projects/festival.html, 4(5), Sept.
- B. Kleijn, K. Paliwal, eds. Speech Coding and Synthesis. Elsevier,
- Y. Shoham. High-quality Speech Coding at 2.4 to 4.0 kbps Based
on Time-Frequency Intepolation. IEEE Proc. ICASSP ‘93, II.167-170, April,
Morais, Edmilson S. / Taylor, Paul / Violaro, Fábio (2000):
"Concatenative text-to-speech synthesis based on prototype waveform interpolation (a time frequency approach)",
In ICSLP-2000, vol.2, 387-390.