8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Automatic Extraction of Key Sentences from Oral Presentations using Statistical Measure based on Discourse Markers

Tasuku Kitade (1), Tatsuya Kawahara (1), Hiroaki Nanjo (2)

(1) Kyoto University, Japan
(2) Ryukoku University, Japan

Automatic extraction of key sentences from academic presentation speeches is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics of the discourse markers are then used to define the importance of the sentences. It is also combined with the conventional tf-idf measure of content words. Comprehensive evaluation using the Corpus of Spontaneous Japanese and a variety of experimental setups is presented in this paper. We carefully designed the evaluation scheme to be compared to human performance. The proposed method using the discourse markers shows consistent effectiveness in the key sentence extraction. Based on the indexing, we realize efficient browsing of lecture audio archives.

Full Paper

Bibliographic reference.  Kitade, Tasuku / Kawahara, Tatsuya / Nanjo, Hiroaki (2004): "Automatic extraction of key sentences from oral presentations using statistical measure based on discourse markers", In INTERSPEECH-2004, 2169-2172.