8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Automatic Title Generation for Chinese Spoken Documents Using an Adaptive K Nearest-Neighbor Approach

Shun-Chuan Chen, Lin-shan Lee

National Taiwan University, Taiwan

The purpose of automatic title generation is to understand a document and to summarize it with only several but readable words or phrases. It is important for browsing and retrieving spoken documents, which may be automatically transcribed, but it will be much more helpful if given the titles indicating the content subjects of the documents. For title generation for Chinese language, additional problems such as word segmentation and key phrase extraction also have to be solved. In this paper, we developed a new approach of title generation for Chinese spoken documents. It includes key phrase extraction, topic classification, and a new title generation model based on an adaptive K nearest-neighbor concept. The tests were performed with a training corpus including 151,537 news stories in text form with human-generated titles and a testing corpus of 210 broadcast news stories. The evaluation included both objective F1 measures and 5-level subjective human evaluation. Very positive results were obtained.

Full Paper

Bibliographic reference.  Chen, Shun-Chuan / Lee, Lin-shan (2003): "Automatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach", In EUROSPEECH-2003, 2813-2816.