Interspeech'2005 - Eurospeech
Recent studies in our lab show that emotions in speech are manifested as, besides supra-segmental trends, distinct variations in phoneme-level prosodic and spectral parameters. In this paper, we further investigate the significance of this finding in the context of emotional speech synthesis. Specifically, we study phonemelevel signal property manipulation in transforming the emotional information conveyed in a speech utterance. We analyze the effect of individual and combined modifications of F0, duration, energy and spectrum using data recorded by a professional actress with happy, angry, sad and neutral expressiveness. We use content matched source-target pairs and apply TDPSOLA for prosody and LPC for spectrum modifications by directly extracting the required parameters from the target speech. Listening tests conducted with 10 naive raters show that modi- fication of prosody and spectral envelope parameters by themselves is not sufficient. However, when applied together, modifying spectrum and prosody at the phone level gives successful results for most emotion pairs, except conversion to happy targets. We also observe that at the phoneme level, spectral envelope modifications are more effective than local prosodic modifications; and that, duration modifications are more effective than pitch modifications. The results confirm our hypothesis that phoneme level modifications can be used to fine tune the ensuing suprasegmental-parameter-based modifications to improve the overall quality of synthesized emotions.
Bibliographic reference. Bulut, Murtaza / Busso, Carlos / Yildirim, Serdar / Kazemzadeh, Abe / Lee, Chul Min / Lee, Sungbok / Narayanan, Shrikanth (2005): "Investigating the role of phoneme-level modifications in emotional speech resynthesis", In INTERSPEECH-2005, 801-804.