12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Hierarchical Stress Modeling in Mandarin Text-to-Speech

Ya Li (1), Jianhua Tao (1), Xiaoying Xu (2)

(1) Chinese Academy of Sciences, China
(2) Beijing Normal University, China

Automatic stress prediction is helpful for both speech synthesis and natural speech understanding. This paper proposes a novel hierarchical Mandarin stress modeling method. The top level emphasizes stressed syllables, while the bottom level focuses on unstressed syllables for the first time due to its importance in both naturalness and expressiveness of synthetic speech. Maximum Entropy model is adopted to predict stress structure from textual features. Experiments show that the modeling method could capture the macro- and micro-characteristics of stress successfully. The F-score of two-level stress predictions are 73.3% and 78.7%, respectively, which are satisfactory compared to other prosody predictions.

Full Paper

Bibliographic reference.  Li, Ya / Tao, Jianhua / Xu, Xiaoying (2011): "Hierarchical stress modeling in Mandarin text-to-speech", In INTERSPEECH-2011, 2013-2016.