We review in a common framework several algorithms that have been proposed recently, in order to improve the voice quality of speech synthesis using diphones [1-3]. These algorithms are based on a pitch-synchronous overlap-add (PSOLA) approach for modifying the speech prosody and concatenating diphone waveforms. The modifications of the speech signal are performed either in the frequency domain (FD-PSOLA), using the Fast Fourier Transform, or directly in the time domain (TD-PSOLA), depending on the length of the window used in the synthesis process. The frequency domain approach is capable of a great flexibility in modifying the spectral characteristics of the speech signal, while the time domain approach provides very efficient solutions for the real time implementation of synthesis systems. We also discuss the different kinds of distortions involved in these different algorithms.
Bibliographic reference. Charpentier, Francis / Moulines, Eric (1989): "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", In EUROSPEECH-1989, 2013-2019.