ISCA Archive SSW 2010
ISCA Archive SSW 2010

EM-HTS: real-time HMM-based Malay emotional speech synthesis

Mumtaz B. Mustafa, Raja N. Ainon, Roziati Zainuddin

This research aims at developing a real-time HMM-based Malay emotional speech synthesis (EM-HTS) that has the ability to synthesize any form of text input in four different expressions which are neutral, anger, sadness and happiness. The quality of the emotional speech synthesis was improved by using Neutral to Angry, Sad, and Happy (NASH) duration generator; which uses context-dependent duration generation method to improve the duration information to the label files of target emotions for training purposes. We conducted three forms of evaluationb to determine the a ccuracy, intelligibility and naturalness of the speech generated by EM-HTS. All the three test show that the adopted method (NASH) gives a better reproduction of prosody compared to conventionsl method using the same training speech data.

Index Terms: HMM-based emotional speech synthesis, context-dependent duration conversion


Cite as: Mustafa, M.B., Ainon, R.N., Zainuddin, R. (2010) EM-HTS: real-time HMM-based Malay emotional speech synthesis. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 340-344

@inproceedings{mustafa10_ssw,
  author={Mumtaz B. Mustafa and Raja N. Ainon and Roziati Zainuddin},
  title={{EM-HTS: real-time HMM-based Malay emotional speech synthesis}},
  year=2010,
  booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)},
  pages={340--344}
}