Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
A basic problem in concatenative speech synthesis are discontinuities at the concatenation points. The units which are produced by different (independent) articulatory movements differ in their spectral characteristics even if their phonetic context is carefully chosen.
This paper describes a wavelet transform of the spectrum of the speech concatenated within the PSOLA algorithm.
This multiresolution analysis separates the following perceptive important spectral characteristica: the intrinsic pitch resulting in a fine-ripple of the spectrum, articulatory movements typically resulting in formant-structures and the global spectral tilt.
In the wavelet domain each of this characteristica can be analysed and manipulated separately in a consistent and completely non paramtric way. Optimised concationation points can easly be located. Remaining spectral irregularities can be adjusted efficiently, resulting in clear and naturally sounding synthetic speech.
Dyadic filter banks are a computational efficient implementation of the presented transform.
Bibliographic reference. Holzapfel, Martin / Hoffmann, Rüdiger / Höge, Harald (1998): "A Wavelet-Domain PSOLA Approach", In SSW3-1998, 283-286.