Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech Synthesis

Yan-You Chen, Chung-Hsien Wu, Yu-Fong Huang


In control vector-based expressive speech synthesis, the emotion/style control vector defined in the categorical (CAT) emotion space is uneasy to be precisely defined by the user to synthesize the speech with the desired emotion/style. This paper applies the arousal-valence (AV) space to the multiple regression hidden semi-Markov model (MRHSMM)-based synthesis framework for expressive speech synthesis. In this study, the user can designate a specific emotion by defining the AV values in the AV space. The multidimensional scaling (MDS) method is adopted to project the AV emotion space and the categorical (CAT) emotion space onto their corresponding orthogonal coordinate systems. A transformation approach is thus proposed to transform the AV values to the emotion control vector in CAT emotion space for MRHSMM-based expressive speech synthesis. In the synthesis phase given the input text and desired emotion, with the transformed emotion control vector, the speech with the desired emotion is generated from the MRHSMMs. Experimental result shows the proposed method is helpful for the user to easily and precisely determine the desired emotion for expressive speech synthesis.


DOI: 10.21437/Interspeech.2016-815

Cite as

Chen, Y., Wu, C., Huang, Y. (2016) Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech Synthesis. Proc. Interspeech 2016, 3176-3180.

Bibtex
@inproceedings{Chen+2016,
author={Yan-You Chen and Chung-Hsien Wu and Yu-Fong Huang},
title={Generation of Emotion Control Vector Using MDS-Based Space Transformation for Expressive Speech Synthesis},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-815},
url={http://dx.doi.org/10.21437/Interspeech.2016-815},
pages={3176--3180}
}