To achieve high quality synthetic emotional speech, unit-selection is the state-of-the-art technique. Nevertheless, a large expensive phonetically-segmented corpus is needed, and cost-effective automatic techniques should be studied. According to the HMM experiments in this paper: segmentation performance can depend heavily on the segmental or prosodic nature of the intended emotion (segmental emotions are more difficult to segment than prosodic ones), several emotions should be combined to obtain a larger training set (especially when prosodic emotions are involved; this is especially true for small training sets) and a combination of emphatic and non-emphatic emotional recordings (short sentences vs. long paragraphs) can degrade overall performance.
Bibliographic reference. Gallardo-Antolín, A. / Barra, R. / Schröder, Marc / Krstulović, Sacha / Montero, J. M. (2007): "Automatic phonetic segmentation of Spanish emotional speech", In INTERSPEECH-2007, 2905-2908.