EUROSPEECH 2003 - INTERSPEECH 2003
This paper proposes an integrated framework to summarize spontaneous speech into written-style compact sentences. Most current speech recognition systems attempt to transcribe whole spoken words correctly. However, recognition results of spontaneous speech are usually difficult to understand, even if the recognition is perfect, because spontaneous speech includes redundant information, and its style is different to that of written sentences. In particular, the style of spoken Japanese is very different to that of the written language. Therefore, techniques to summarize recognition results into readable and compact sentences are indispensable for generating captions or minutes from speech. Our speech summarization includes speech recognition, paraphrasing, and sentence compaction, which are integrated in a single Weighted Finite-State Transducer (WFST). This approach enables the decoder to employ all the knowledge sources in a one-pass search strategy and reduces the search errors, since all the constraints of the models are used from the beginning of the search. We conducted experiments on a 20kword Japanese lecture speech recognition and summarization task. Our approach yielded improvements in both recognition accuracy and summarization accuracy compared with other approaches that perform speech recognition and summarization separately.
Bibliographic reference. Hori, Takaaki / Hori, Chiori / Minami, Yasuhiro (2003): "Speech summarization using weighted finite-state transducers", In EUROSPEECH-2003, 2817-2820.