ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Two-Stage Automatic Speech Summarization by Sentence Extraction and Compaction

Tomonori Kikuchi (1), Sadaoki Furui (1), Chiori Hori (2)

(1) Department of Computer Science, Tokyo Institute of Technology, Japan
(2) Intelligent Communication Laboratory, NTT Communication Science Laboratories, Japan 112

This paper proposes a new automatic speech summarization method having two stages: important sentence extraction and sentence compaction. Relatively important sentences are extracted from the results of large-vocabulary continuous speech recognition (LVCSR) based on the amount of information and the confidence measures of constituent words. The set of extracted sentences is compressed by our sentence compaction method. Sentence compaction is performed by selecting a word set that maximizes a summarization score which comprises the amount of information and the con- fidence measure of each word, the linguistic likelihood of word strings, and the word concatenation probability. The selected words are concatenated to create a summary. Effectiveness of the proposed method was confirmed by testing summarization of spontaneous presentations. Optimal ratio of sentence extraction to sentence compaction changes according to the target summarization ratio and features of presentations.

