8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Constructing Emotional Speech Synthesizers with Limited Speech Database

Heiga Zen (1), Tadashi Kitamura (1), Murtaza Bulut (2), Shrikanth Narayanan (2), Ryosuke Tsuzuki (1), Keiichi Tokuda (1)

(1) Nagoya Institute of Technology, Japan
(2) University of Southern California, USA

This paper describes an emotional speech synthesis system based on HMMs and related modeling techniques. For concatenative speech synthesis, we require all of the concatenation units that will be used to be recorded beforehand and made available at synthesis time. To adopt this approach for synthesizing the wide variety of human emotions possible in speech, implies that this process should be repeated for every targeted emotion making this task challenging and time consuming. In this paper, we propose an emotional speech synthesis technique based on HMMs, especially for the case where only limited amount of training data is available, directly incorporating subjective evaluation results performed on the training data. Listening results performed on the synthesized speech suggest that the proposed technique helps to improve the emotional content of synthesized speech.

Full Paper

Bibliographic reference.  Zen, Heiga / Kitamura, Tadashi / Bulut, Murtaza / Narayanan, Shrikanth / Tsuzuki, Ryosuke / Tokuda, Keiichi (2004): "Constructing emotional speech synthesizers with limited speech database", In INTERSPEECH-2004, 1185-1188.