A continuous expression space assumes that each utterance contains individual expressions. Thus, it can be used to model detailed expression information in speech data. However, since an infinite number of different expressions can be contained in the continuous expression space, it is very difficult to manually label them. That means, these expressions are very hard to identify and to extract for synthesising expressive speech. A mechanism to control the continuous expression space is missing. In the discrete expression space though, only a few emotions are defined, thus users can easily choose from these emotions, but the range of expressivity is limited. This work proposes a method to automatically annotate expressions in the continuous expression space based on the cluster adaptive training (CAT) method. Using the proposed method, complex emotion information can be associated to the individual expressions in the continuous space. These emotion labels can be used as indexes of the expressions in the continuous space to enable users to select desired expressions at synthesis time, i.e. enable the controllability for the continuous expression space. Meanwhile, the rich expressive information in the continuous space is kept so that more expressive speech can be generated compared to the discrete space.
Bibliographic reference. Chen, Langzhou / Braunschweiler, Norbert (2014): "Enabling controllability for continuous expression space", In INTERSPEECH-2014, 2912-2916.