ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Informed blending of databases for emotional speech synthesis

Gregor O. Hofer, Korin Richmond, Robert A. J. Clark

The goal of this project was to build a unit selection voice that could portray emotions with varying intensities. A suitable definition of an emotion was developed along with a descriptive framework that supported the work carried out. A single speaker was recorded portraying happy and angry speaking styles. Additionally a neutral database was also recorded. A target cost function was implemented that chose units according to emotion mark-up in the database. The Dictionary of Affect supported the emotional target cost function by providing an emotion rating for words in the target utterance. If a word was particularly ‘emotional', units from that emotion were favoured. In addition intensity could be varied which resulted in a bias to select a greater number emotional units. A perceptual evaluation was carried out and subjects were able to recognise reliably emotions with varying amounts of emotional units present in the target utterance.

doi: 10.21437/Interspeech.2005-326

Cite as: Hofer, G.O., Richmond, K., Clark, R.A.J. (2005) Informed blending of databases for emotional speech synthesis. Proc. Interspeech 2005, 501-504, doi: 10.21437/Interspeech.2005-326

  author={Gregor O. Hofer and Korin Richmond and Robert A. J. Clark},
  title={{Informed blending of databases for emotional speech synthesis}},
  booktitle={Proc. Interspeech 2005},