This paper presents high-level strategies for controlling emotional speech morphing algorithms. Emotion morphing is realized by representing the acoustic features in their timefrequency plan that is warped and modified to generate natural morphed emotional speech. These acoustic features are desirable to be decomposed into multidimensional space and to be orthogonal. After matching these acoustic features of speech, a morph smoothly interpolates their variations not only in time domain but also their amplitudes in frequency domain to describe a new emotional speech in the same perceptual space. Finally, these descriptors are synthesized to produce morphed speech waveform. This paper describes representations of acoustic features, techniques for matching, and algorithms for interpolating and morphing acoustic features such as duration, spectral envelope and pitch contour using STRAIGHT [1] as an example. The subjective listen test will be showed for emotional speech morphing of which the quality and naturalness were comparable to natural speech samples.
Index Terms: emotional speech morphing, acoustic features, warping, matching, interpolation
Cite as: Huang, D.-Y., Rahardja, S., Ong, E.P. (2010) High level emotional speech morphing using STRAIGHT. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 345-350
@inproceedings{huang10c_ssw, author={Dong-Yan Huang and Susanto Rahardja and Ee Ping Ong}, title={{High level emotional speech morphing using STRAIGHT}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={345--350} }