8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Using Multiple Linguistic Features for Mandarin Phrase Break Prediction in Maximum-entropy Classification Framework

Yu Zheng (1), Gary Geunbae Lee (1), Byeongchang Kim (2)

(1) Pohang University of Science & Technology(POSTECH), Korea
(2) Catholic University of Daegu, Korea

We model Mandarin phrase break prediction as a classification problem with three level prosodic structures and apply conditional maximum entropy classification to this problem. We acquire multiple levels of linguistic knowledge from an annotated corpus to become well-integrated features for maximum entropy framework. Five kinds of features were used to represent various linguistic constraints including POS tag features, lexical features, phonetic features, length features, and distance features. Experiment results show that our method performs better than the previous methods and the conditional maximum entropy (ME) model is very effective for data sparseness problem in Mandarin phrase break prediction.

Full Paper

Bibliographic reference.  Zheng, Yu / Lee, Gary Geunbae / Kim, Byeongchang (2004): "Using multiple linguistic features for Mandarin phrase break prediction in maximum-entropy classification framework", In INTERSPEECH-2004, 737.