11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Semi-Supervised Extractive Speech Summarization via Co-Training Algorithm

Shasha Xie (1), Hui Lin (2), Yang Liu (1)

(1) University of Texas at Dallas, USA
(2) University of Washington, USA

Supervised methods for extractive speech summarization require a large training set. Summary annotation is often expensive and time consuming. In this paper, we exploit semi-supervised approaches to leverage unlabeled data. In particular, we investigate co-training for the task of extractive meeting summarization. Compared with text summarization, speech summarization task has its unique characteristic in that the features naturally split into two sets: textual features and prosodic/acoustic features. Such characteristic makes co-training an appropriate approach for semi-supervised speech summarization. Our experiments on the ICSI meeting corpus show that by utilizing the unlabeled data, co-training significantly improves summarization performance when only a small amount of labeled data is available.

Full Paper

Bibliographic reference.  Xie, Shasha / Lin, Hui / Liu, Yang (2010): "Semi-supervised extractive speech summarization via co-training algorithm", In INTERSPEECH-2010, 2522-2525.