5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Error Analysis and Confidence Measure of Chinese Word Segmentation

Chih-Chung Kuo, Kun-Yuan Ma

Industrial Technology Research Institute, Taiwan

Word segmentation for a Chinese sentence is essential for many applications in language and speech processing. There's no perfect method that could achieve word segmentation without any errors. We propose a confidence measure for the segmentation result to cope with the problem caused by the errors. The effective method depends mainly on the error analysis of the word segmentation. With the confidence measure the suspected errors can be identified such that manual inspection loads can be largely reduced for non-real-time applications. A soft-decision method and a composite-word approach for prosody generation are also designed for text-to-speech systems by exploiting the confidence measure, such that the wrong prosody caused by wrong word boundaries can be alleviated.

Full Paper
Sound Examples:
#1
- The speech synthesized with poor prosody due to wrong word segmentation.
#2 - The speech synthesized is based on composite word approach, which obviously produces more correct and natural prosody.

Bibliographic reference.  Kuo, Chih-Chung / Ma, Kun-Yuan (1998): "Error analysis and confidence measure of Chinese word segmentation", In ICSLP-1998, paper 1078.