2003 ISCA Workshop on Multilingual Spoken Document Retrieval
(MSDR2003)

Hong Kong
April 4-5, 2003

Document Expansion using a Side Collection for Monolingual and Cross-Language Spoken Document Retrieval

Yuk-Chi Li, Helen M. Meng

Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong

This paper presents a method of document expansion using a side collection for improving the overall performance in retrieving spoken documents using text queries. This method is applied to Chinese spoken document retrieval (SDR) tasks where a series of experiments have been carried out for both monolingual and cross-language SDR systems. In our monolingual retrieval experiments, Cantonese broadcast news documents are retrieved using a multi-scale syllable-based approach. Experimental results show that application of document expansion can achieve an improvement of 56% in average inverse rank (AIR). For the cross-language spoken document retrieval (CL-SDR) task where Mandarin broadcast news is retrieved using English textual queries, experimental results show that the use of document expansion has brought 14% relative improvement in retrieval performance.


Full Paper

Bibliographic reference.  Li, Yuk-Chi / Meng, Helen M. (2003): "Document expansion using a side collection for monolingual and cross-language spoken document retrieval", In MSDR-2003, 85-90.