We propose a simple method for modifying emotional speech sounds. The method aims at real-time implementation of an emotional expression transformation system based on STRAIGHT. We developed a mapping function of spectra, fundamental frequencies (F0), and vowel durations from the statistical analysis of 1500 expressive speech sounds in an emotional speech database. The spectral mapping parameters are initially extracted at the centers of vowels and interpolated with bilinear functions. The spectral frequency warping functions are manually designed. The F0 and duration mapping functions simply transform the average values in log frequency and linear time scales. We demonstrate that the spectral distortion is small enough when Neutral' speech sounds are transformed to expressive speech sounds (i.e. Bright', Excited', Angry', and Raging' speech sounds).
Cite as: Takahashi, T., Fujii, T., Nishi, M., Banno, H., Irino, T., Kawahara, H. (2005) Voice and emotional expression transformation based on statistics of vowel parameters in an emotional speech database. Proc. Interspeech 2005, 1853-1856, doi: 10.21437/Interspeech.2005-585
@inproceedings{takahashi05b_interspeech, author={Toru Takahashi and Takeshi Fujii and Masashi Nishi and Hideki Banno and Toshio Irino and Hideki Kawahara}, title={{Voice and emotional expression transformation based on statistics of vowel parameters in an emotional speech database}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1853--1856}, doi={10.21437/Interspeech.2005-585} }