Speech Prosody 2010
Chicago, IL, USA
Affective Speech Synthesis is quite important for various applications like storytelling, speech based user interfaces, computer games, etc. However, some studies revealed that Text-To-Speech (TTS) systems have tendency for not conveying a suitable emotional expressivity in their outputs. Due to the recent convergence of several analytical studies pertaining to affect and human speech, this problem can now be tackled by a new angle that has at its core an appropriate prosodic parameterization based on an intelligent detection of the affective clues of the input text. This, allied with recent findings on affective speech analysis, allows a suitable assignment of pitch accents, other prosodic parameters and signal properties that adhere to F0 and match the optimal parameterization for the emotion detected in the input text. Such approach allows the input text to be enriched with metainformation that assists efficiently the TTS system. Furthermore, the output of the TTS system is also postprocessed in order to enhance its affective content. Several preliminary tests confirm the validity of our approach and encourage us to continue its exploration.
Index Terms: speech synthesis, intelligent text processing, affect sensing, prosody
Bibliographic reference. Shaikh, Mostafa Al Masum / Rebordao, Antonio Rui Ferreira / Hirose, Keikichi (2010): "Improving TTS synthesis for emotional expressivity by a prosodic parameterization of affect based on linguistic analysis", In SP-2010, paper 970.