11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Extractive Summarization Using a Latent Variable Model

Asli Celikyilmaz (1), Dilek Hakkani-Tür (2)

(1) University of California at Berkeley, USA

Extractive multi-document summarization is the task of choosing the sentences from documents to compose a summary text in response to a user query. We propose a generative approach to explicitly identify summary and non-summary topic distributions in document cluster sentences. Using these approximate summary topic probabilities as latent output variables, we build a discriminative classifier model. The sentences in new document clusters are inferred using the trained model. In our experiments we find that the proposed summarization approach is effective in comparison to the state-of-the-art methods.

Full Paper

Bibliographic reference.  Celikyilmaz, Asli / Hakkani-Tür, Dilek (2010): "Extractive summarization using a latent variable model", In INTERSPEECH-2010, 2526-2529.