Hierarchical stress generation with Fujisaki model in expressive speech synthesis

Ya Li, Jianhua Tao, Keikichi Hirose, Wei Lai, Xiaoying Xu


This paper introduces a hierarchical stress generation for expressive speech synthesis. In the previous study, we proposed a novel hierarchical Mandarin stress modeling method, and the text-based stress prediction experiments demonstrates a reliable stress assignment can be obtained from textual features. However, the stress model should be further verified to be an effective and efficient prosody model in a Text-to-Speech system. In this work, Fujisaki model known as an ideal global representation of prosody is adopted to construct the pitch contours. To illustrate the effect of stress model, the Fujisaki model parameters are automatically predicted by the textural feature with and without stress information. The synthetic speech sounds more natural than that without stress modeling. The RMSE of the pitch contour and the feature importance analysis also show stress information can improve the pitch modeling. This work offers a promising method to accurate pitch modeling for Mandarin expressive speech synthesis.


 DOI: 10.21437/SpeechProsody.2014-195

Cite as: Li, Y., Tao, J., Hirose, K., Lai, W., Xu, X. (2014) Hierarchical stress generation with Fujisaki model in expressive speech synthesis. Proc. 7th International Conference on Speech Prosody 2014, 1032-1036, DOI: 10.21437/SpeechProsody.2014-195.


@inproceedings{Li2014,
  author={Ya Li and Jianhua Tao and Keikichi Hirose and Wei Lai and Xiaoying Xu},
  title={{Hierarchical stress generation with Fujisaki model in expressive speech synthesis}},
  year=2014,
  booktitle={Proc. 7th International Conference on Speech Prosody 2014},
  pages={1032--1036},
  doi={10.21437/SpeechProsody.2014-195},
  url={http://dx.doi.org/10.21437/SpeechProsody.2014-195}
}