In this paper, we describe an approach to automatically synthesize the emotional speech of a target speaker based on the hidden Markov model for his/her neutral speech. The basic idea is the model interpolation between the neutral model of the target speaker and an emotional model selected from a candidate pool. Both the interpolation model selection and the interpolation weight computation are determined based on a modeldistance measure. In this paper, we propose a monophonebased Mahalanobis distance (MBMD). We evaluate our approach on the synthesized emotional speech of angriness, happiness, and sadness with several subjective tests. Experimental results show that the implemented system is able to synthesize speech with emotional expressiveness of the target speaker.
Index Terms: speech synthesis, HMM, emotional expressiveness, Mahalanobis distance, model interpolation
Cite as: Yang, C.-Y., Chen, C.-P. (2010) A hidden Markov model-based approach for emotional speech synthesis. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 126-129
@inproceedings{yang10_ssw, author={Chih-Yung Yang and Chia-Ping Chen}, title={{A hidden Markov model-based approach for emotional speech synthesis}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={126--129} }