Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Informed Blending of Databases for Emotional Speech Synthesis

Gregor O. Hofer, Korin Richmond, Robert A. J. Clark

University of Edinburgh, UK

The goal of this project was to build a unit selection voice that could portray emotions with varying intensities. A suitable definition of an emotion was developed along with a descriptive framework that supported the work carried out. A single speaker was recorded portraying happy and angry speaking styles. Additionally a neutral database was also recorded. A target cost function was implemented that chose units according to emotion mark-up in the database. The Dictionary of Affect supported the emotional target cost function by providing an emotion rating for words in the target utterance. If a word was particularly ‘emotional', units from that emotion were favoured. In addition intensity could be varied which resulted in a bias to select a greater number emotional units. A perceptual evaluation was carried out and subjects were able to recognise reliably emotions with varying amounts of emotional units present in the target utterance.

Full Paper

Bibliographic reference.  Hofer, Gregor O. / Richmond, Korin / Clark, Robert A. J. (2005): "Informed blending of databases for emotional speech synthesis", In INTERSPEECH-2005, 501-504.