9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Assigning Suitable Phrasal Tones and Pitch Accents by Sensing Affective Information from Text to Synthesize Human-Like Speech

Mostafa Al Masum Shaikh, Md. Khademul Islam Molla, Keikichi Hirose

University of Tokyo, Japan

We have carried out several perceptual and objective experiments that show that the present Text-To-Speech (TTS) systems are weak in the relevance of prosody and segmental spectrum in the characterization and expression of emotions. Since it is known that the emotional state of a speaker usually alters the way s/he speaks, the TTS systems need to be improved to generate human-like pitch accents to express the subtle features of emotions. This paper describes a pitch accent assignment technique which places appropriate pitch accents on elements of the utterance that require particular emphasis or stress. Our pitch accenting technique utilizes commonsense knowledge-base and a linguistic tool to recognize emotion conveyed though the text itself. From these it determines whether the content of the utterance has a connotation to a particular emotion (e.g., happy, sad, surprise etc.), good or bad concepts, praiseworthy or blameworthy actions, common or vital information. It can then assign an appropriate pitch accent to one word in each prosodic phrase. The TTS component then determines the appropriate syllable to be accented in the word. Our approach can well support a TTS system's synthesis, allowing the system to generate affective version of the spoken text.

Full Paper

Bibliographic reference.  Shaikh, Mostafa Al Masum / Molla, Md. Khademul Islam / Hirose, Keikichi (2008): "Assigning suitable phrasal tones and pitch accents by sensing affective information from text to synthesize human-like speech", In INTERSPEECH-2008, 326-329.