Current unit selection speech synthesis voices cannot produce emphasis or interrogative contours because of a lack of the necessary prosodic variation in the recorded speech database. A method of recording script design is proposed which addresses this shortcoming. Appropriate components were added to the target cost function of the Festival Multisyn engine, and a perceptual evaluation showed a clear preference over the baseline system.
Cite as: Strom, V., Clark, R.A.J., King, S. (2006) Expressive prosody for unit-selection speech synthesis. Proc. Interspeech 2006, paper 1522-Tue3BuP.1, doi: 10.21437/Interspeech.2006-381
@inproceedings{strom06_interspeech, author={Volker Strom and Robert A. J. Clark and Simon King}, title={{Expressive prosody for unit-selection speech synthesis}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1522-Tue3BuP.1}, doi={10.21437/Interspeech.2006-381} }