Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Issues in Topic Identification on the Switchboard Corpus

John McDonough, Herbert Gish

BBN Systems and Technologies, Cambridge, MA, USA

Topic identification (TID) is the automatic classification of speech messages into one of a known set of possible topics. The TID task can be view as having three principal components: 1) event generation, 2) keyword event selection, and 3) topic modeling. Using data from the Switchboard corpus, we present experimental results for various approaches to the TID problem and compare the relative effectiveness of each. In particular, we examine issues in topic modeling and keyword selection.

