This paper presents a multi-scale retrieval approach in MEI (Mandarin-English Information), an English-Chinese cross-lingual spoken document retrieval (CL-SDR) system. It accepts an entire English news story (from newspaper text) as the input query, and automatically retrieves "relevant" Mandarin news stories (from broadcast audio). This allows the user to search for personally relevant content across the language and media barriers - a cross-lingual and cross-media retrieval task. MEI advocates a multi-scale paradigm for the retrieval task. Multiscale refers to the use of both words and subwords (Chinese characters and syllables) for retrieval. Words offer lexical knowledge to enhance precision, and subwords can potentially alleviate some prevailing problems in CL-SDR, e.g. open vocabularies in translation and recognition, out-of-vocabulary words in audio indexing, and ambiguities in Chinese homophones and word tokenizaiton. We present techniques for word-subword fusion, which improved retrieval performance in our experiments with the Topic Detection and Tracking collection.
Cite as: Lo, W.-K., Schone, P., Meng, H.M. (2001) Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1303-1306, doi: 10.21437/Eurospeech.2001-337
@inproceedings{lo01_eurospeech, author={Wai-Kit Lo and Patrick Schone and Helen M. Meng}, title={{Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1303--1306}, doi={10.21437/Eurospeech.2001-337} }