8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Automatic Title Generation for Chinese Spoken Documents Considering the Special Structure of the Language

Lin-shan Lee, Shun-Chuan Chen

National Taiwan University, Taiwan

The purpose of automatic title generation is to understand a document and to summarize it with only several but readable words or phrases. It is important for browsing and retrieving spoken documents, which may be automatically transcribed, but it will be much more helpful if given the titles indicating the content subjects of the documents. On the other hand, the Chinese language is not only spoken by the largest population of the world, but with very special structure different from western languages. It is not alphabetic, with large number of distinct characters each pronounced as a monosyllable, while the total number of syllables is limited. In this paper, considering the special structure of the Chinese language, a set of "feature units" for Chinese spoken language processing is defined and the effects of the choice of these "feature units" on automatic title generation are analyzed with a new adaptive K nearest-neighbor approach, proposed in a companion paper also submitted to this conference as the baseline.

Full Paper

Bibliographic reference.  Lee, Lin-shan / Chen, Shun-Chuan (2003): "Automatic title generation for Chinese spoken documents considering the special structure of the language", In EUROSPEECH-2003, 2325-2328.