The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis

Kyoto, Japan
September 22-24, 2010

High Level Emotional Speech Morphing Using STRAIGHT

Dong-Yan Huang, Susanto Rahardja, Ee Ping Ong

Institute for Infocomm Research, A*STAR, Singapore

This paper presents high-level strategies for controlling emotional speech morphing algorithms. Emotion morphing is realized by representing the acoustic features in their timefrequency plan that is warped and modified to generate natural morphed emotional speech. These acoustic features are desirable to be decomposed into multidimensional space and to be orthogonal. After matching these acoustic features of speech, a morph smoothly interpolates their variations not only in time domain but also their amplitudes in frequency domain to describe a new emotional speech in the same perceptual space. Finally, these descriptors are synthesized to produce morphed speech waveform. This paper describes representations of acoustic features, techniques for matching, and algorithms for interpolating and morphing acoustic features such as duration, spectral envelope and pitch contour using STRAIGHT [1] as an example. The subjective listen test will be showed for emotional speech morphing of which the quality and naturalness were comparable to natural speech samples.

Index Terms: emotional speech morphing, acoustic features, warping, matching, interpolation

