8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Statistical Chinese Spoken Document Retrieval using Latent Topical Information

Jen-Wei Kuo (1), Yao-Min Huang (1), Berlin Chen (1), Hsin-min Wang (2)

(1) National Taiwan Normal University, Taiwan
(2) Academia Sinica, Taiwan

Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT-2 Chinese collection. Very encouraging retrieval performance was obtained.

Full Paper

Bibliographic reference.  Kuo, Jen-Wei / Huang, Yao-Min / Chen, Berlin / Wang, Hsin-min (2004): "Statistical Chinese spoken document retrieval using latent topical information", In INTERSPEECH-2004, 1621-1624.