8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Speech Summarization Using Weighted Finite-State Transducers

Takaaki Hori, Chiori Hori, Yasuhiro Minami

NTT Corporation, Japan

This paper proposes an integrated framework to summarize spontaneous speech into written-style compact sentences. Most current speech recognition systems attempt to transcribe whole spoken words correctly. However, recognition results of spontaneous speech are usually difficult to understand, even if the recognition is perfect, because spontaneous speech includes redundant information, and its style is different to that of written sentences. In particular, the style of spoken Japanese is very different to that of the written language. Therefore, techniques to summarize recognition results into readable and compact sentences are indispensable for generating captions or minutes from speech. Our speech summarization includes speech recognition, paraphrasing, and sentence compaction, which are integrated in a single Weighted Finite-State Transducer (WFST). This approach enables the decoder to employ all the knowledge sources in a one-pass search strategy and reduces the search errors, since all the constraints of the models are used from the beginning of the search. We conducted experiments on a 20kword Japanese lecture speech recognition and summarization task. Our approach yielded improvements in both recognition accuracy and summarization accuracy compared with other approaches that perform speech recognition and summarization separately.

Full Paper

Bibliographic reference.  Hori, Takaaki / Hori, Chiori / Minami, Yasuhiro (2003): "Speech summarization using weighted finite-state transducers", In EUROSPEECH-2003, 2817-2820.