In the state-of-the-art speech synthesis system, prosodic phrase prediction is the most serious problem which leads to about 40% of text analysis errors. Two targeted optimization strategies are proposed in this paper to deal with two major types of prosodic phrase prediction errors. First, unsupervised adaptation method is proposed to relief the mismatching problem between training and testing, and syntactic features are extracted from parser and integrated into prediction model to ensure the predicted prosodic structure somehow be consistent with syntactic structure. We verify our solutions on a mature Mandarin speech synthesis system and experiment results show that both of the two strategies have positive influences and the sentence unacceptable rate significantly drops from 15.9% to 8.75%.
Bibliographic reference. Chen, Zhigang / Hu, Guoping / Jiang, Wei (2010): "Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction", In INTERSPEECH-2010, 1421-1424.