Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach

Yibin Zheng, Ya Li, Zhengqi Wen, Xingguang Ding, Jianhua Tao


Hierarchical prosody structure generation is an important but challenging component for speech synthesis systems. In this paper, we investigate the use of enhanced embedding (joint learning of character and word embedding (CWE)) features and different model fusion approaches at both character and word level for Mandarin prosodic boundaries prediction. For CWE module, the internal structures of words and non-compositional words are considered in the word embedding, while the character ambiguity is addressed by multiple-prototype character embedding. For model fusion module, linear function (LF) and gradient boosting decision tree (GBDT), are investigated at the decision level respectively, with the important features selected by feature ranking module used as its input. Experiment results show the effectiveness of the proposed enhanced embedding features and the two model fusion approaches at both character and word level.


DOI: 10.21437/Interspeech.2016-1060

Cite as

Zheng, Y., Li, Y., Wen, Z., Ding, X., Tao, J. (2016) Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach. Proc. Interspeech 2016, 3201-3205.

Bibtex
@inproceedings{Zheng+2016,
author={Yibin Zheng and Ya Li and Zhengqi Wen and Xingguang Ding and Jianhua Tao},
title={Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1060},
url={http://dx.doi.org/10.21437/Interspeech.2016-1060},
pages={3201--3205}
}