Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Voice Quality Interpolation for Emotional Text-to-Speech Synthesis

Oytun Turk (1), Marc Schröder (2), Baris Bozkurt (3), Levent M. Arslan (1)

(1) Sestek Inc., Turkey; (2) DFKI GmbH, Saarbrücken, Germany; (3) Faculté Polytechnique de Mons, Belgium

Synthesizing desired emotions using concatenative algorithms relies on collection of large databases. This paper focuses on the development and assessment of a simple algorithm to interpolate the intended vocal effort in existing databases in order to create new databases with intermediate levels of vocal effort. Three diphone databases in German with soft, modal, and loud voice qualities are processed with a spectral interpolation algorithm. A listening test is performed to evaluate the intended vocal effort in the original databases as well as the interpolated ones. The results show that the interpolation algorithm can create the intended intermediate levels of vocal effort given the original databases independent of the language background of the subjects.

Full Paper

Bibliographic reference.  Turk, Oytun / Schröder, Marc / Bozkurt, Baris / Arslan, Levent M. (2005): "Voice quality interpolation for emotional text-to-speech synthesis", In INTERSPEECH-2005, 797-800.