We report on research in which we increased the degree of spectral control in concatenative synthesis by controlling the formant frequencies of the synthetic speech, as well as the energies in four spectral bands. In addition, we eliminated "points" of concatenation in favor of "regions" of concatenation, by cross-fading between the end and the beginning of two speech segments that are part of a concatenation operation. We hypothesized that these approaches would decrease the frequency and severity of audible discontinuities in the synthetic speech and thus also increase the perceived quality of the speech. A listening test determined that stimuli created with the proposed methods resulted in significantly increased quality.
Cite as: Kain, A.B., Miao, Q., Santen, J.P.H.v. (2007) Spectral control in concatenative speech synthesis. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 11-16
@inproceedings{kain07_ssw, author={Alexander B. Kain and Qi Miao and Jan P. H. van Santen}, title={{Spectral control in concatenative speech synthesis}}, year=2007, booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)}, pages={11--16} }