14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Supervised Spoken Document Summarization Based on Structured Support Vector Machine with Utterance Clusters as Hidden Variables

Sz-Rung Shiang (1), Hung-yi Lee (2), Lin-shan Lee (1)

(1) National Taiwan University, Taiwan
(2) Academia Sinica, Taiwan

This paper presents a supervised approach for extractive summarization of spoken document considering utterance clusters in the documents as hidden variables. Utterances in important clusters may be jointly included in the summary, while those in less important clusters may be excluded as a whole. The summaries are therefore selected based on not only the conventional principle of including the most important utterances and minimizing the redundancy but also the hidden cluster structure in the document. The cluster structure of the documents is not known but can be inferred from the documents, and the summaries can be jointly obtained by the structured SVM learned from the training examples. Encouraging results were obtained on a lecture corpus in the preliminary experiments.

Full Paper

Bibliographic reference.  Shiang, Sz-Rung / Lee, Hung-yi / Lee, Lin-shan (2013): "Supervised spoken document summarization based on structured support vector machine with utterance clusters as hidden variables", In INTERSPEECH-2013, 2728-2732.