8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

A Unified Probabilistic Generative Framework for Extractive Spoken Document Summarization

Yi-Ting Chen (1), Hsuan-Sheng Chiu (1), Hsin-Min Wang (2), Berlin Chen (1)

(1) National Taiwan Normal University, Taiwan
(2) Academia Sinica, Taiwan

In this paper, we consider extractive summarization of Chinese broadcast news speech. A unified probabilistic generative framework that combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of a spoken document to be summarized was treated as a probabilistic generative model for predicting the document. Two different matching strategies, i.e., literal term matching and concept matching, were extensively investigated. We explored the use of the hidden Markov model (HMM) and relevance model (RM) for literal term matching, while the word topical mixture model (WTMM) for concept matching. On the other hand, the confidence scores, structural features, and a set of prosodic features were properly incorporated together using the whole sentence maximum entropy model (WSME) for the estimation of the sentence prior probability. The experiments were performed on the Chinese broadcast news collected in Taiwan. Very promising and encouraging results were initially obtained.

Full Paper

Bibliographic reference.  Chen, Yi-Ting / Chiu, Hsuan-Sheng / Wang, Hsin-Min / Chen, Berlin (2007): "A unified probabilistic generative framework for extractive spoken document summarization", In INTERSPEECH-2007, 2805-2808.