Supervised methods for extractive speech summarization require a large training set. Summary annotation is often expensive and time consuming. In this paper, we exploit semi-supervised approaches to leverage unlabeled data. In particular, we investigate co-training for the task of extractive meeting summarization. Compared with text summarization, speech summarization task has its unique characteristic in that the features naturally split into two sets: textual features and prosodic/acoustic features. Such characteristic makes co-training an appropriate approach for semi-supervised speech summarization. Our experiments on the ICSI meeting corpus show that by utilizing the unlabeled data, co-training significantly improves summarization performance when only a small amount of labeled data is available.
Bibliographic reference. Xie, Shasha / Lin, Hui / Liu, Yang (2010): "Semi-supervised extractive speech summarization via co-training algorithm", In INTERSPEECH-2010, 2522-2525.