Synthesizing desired emotions using concatenative algorithms relies on collection of large databases. This paper focuses on the development and assessment of a simple algorithm to interpolate the intended vocal effort in existing databases in order to create new databases with intermediate levels of vocal effort. Three diphone databases in German with soft, modal, and loud voice qualities are processed with a spectral interpolation algorithm. A listening test is performed to evaluate the intended vocal effort in the original databases as well as the interpolated ones. The results show that the interpolation algorithm can create the intended intermediate levels of vocal effort given the original databases independent of the language background of the subjects.
Cite as: Turk, O., Schröder, M., Bozkurt, B., Arslan, L.M. (2005) Voice quality interpolation for emotional text-to-speech synthesis. Proc. Interspeech 2005, 797-800, doi: 10.21437/Interspeech.2005-377
@inproceedings{turk05_interspeech, author={Oytun Turk and Marc Schröder and Baris Bozkurt and Levent M. Arslan}, title={{Voice quality interpolation for emotional text-to-speech synthesis}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={797--800}, doi={10.21437/Interspeech.2005-377} }