International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Investigation and Analysis on Designing Chinese Balance Corpus

Rile Hu (1), Chengqing Zong (1), Juha Iso-Sipila (2), Bo Xu (1)

(1) Chinese Academy of Sciences, Beijing, China
(2) Nokia China R&D Center, Beijing, China

Recently, the statistical methods have become the main methods in the research of computational linguistics and natural language processing. The corpus is the basis of the statistical method. How to keep the balance in corpus collection is an important issue. In this paper, we report the results of our investigation and analysis on some real corpus, and propose a scheme to keep the balance in corpus design. Suggestions for the composition in corpus design are also presented in this paper.

Full Paper

Bibliographic reference.  Hu, Rile / Zong, Chengqing / Iso-Sipila, Juha / Xu, Bo (2002): "Investigation and analysis on designing Chinese balance corpus", In ISCSLP 2002, paper 110.