Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Segmentation of Prosodic Phrases for Improving the Naturalness of Synthesized Mandarin Chinese Speech

Zhengyu Niu, Peiqi Chai

Department of Computer Science,Tongji University Shanghai, China

It is noticed that i n natural speech sentences are breaked into breath groups. Some words seem to be more closely grouped with adjacent words: we call these groups prosodic phrases. In order to improve the naturalness of synthesized speech, prosodic processing in both text-processing component and speech generation component is needed. The text-processing component is more important because the performance of speech generation component is dependent on the ability of the previous one. This paper discussed how to break sentences into prosodic phrases.

At first, for segmentation of prosodic phrases, the text is segmented into Chinese words. Then these words are annotated with an automatic Part-of-Speech tagger. Adjacent words which have close syntactic relation are grouped to form prosodic phrases using the POS tags and syntactic phrase structure information. When breaking prosodic phrases other factors must be taken into consideration, such as speech velocity, pragmatic knowledge, the context, and the speaker's feeling.

The POS tagging algorithm is based on integration of the statistical method and rule method.2-Gram Markov language model is used in the algorithm. The most likely POS sequence for a given sentence is found by searching through the language model and picking the most likely path. Then the rule method is used to correct the errors caused by statistical method, which identifies a word's category using context information. Through experiments the tagger correctly tagged 94% of words in an independent test set of 1.2 thousand Chinese characters.

Based on rules, the lexical information and phrase structure information will be used to form prosodic phrases. Through experiments we obtained a break-correct figure of 86% and a recall rate of 90%. After segmentation of prosodic phrases, these grouped words are read continuously when the text is converted to speech. And the naturalness of synthesized speech is improved.

Full Paper

Bibliographic reference.  Niu, Zhengyu / Chai, Peiqi (2000): "Segmentation of prosodic phrases for improving the naturalness of synthesized Mandarin Chinese speech", In ICSLP-2000, vol.3, 350-353.