 |
2003 ISCA Workshop on
Multilingual Spoken Document Retrieval
(MSDR2003)
Hong Kong
April 4-5, 2003 |
 |
Multi-Scale Document Expansion for Mandarin Chinese
Gina-Anne Levov
Department of Computer Science, University of Chicago, IL, USA
In cross-language spoken document retrieval,
potentially errorful translations of a source language
query must be matched against potentially errorful
automatic speech recognition transcriptions of
spoken documents Document expansion, using pseudo-relevance
feedback to enrich the original transcript
with related selective terms, can help to recover
matches lost through mistranscription or absent from
translation. In this paper we compare three multi-scale
strategies for unit selection in different phases
of the document expansion and retrieval process on
Mandarin Chinese documents, using character bigrams,
words, and a hybrid strategy combining bigrams and
words. We find that the hybrid bigram-word
strategy that uses bigrams to enhance recall and
identifies highly selective words to enhance precision
for expansion result in the greatest, highly
significant improvement over unexpanded documents, and
additionally outperforms retrieval on perfect manual transcriptions.
Full Paper
Bibliographic reference.
Levov, Gina-Anne (2003):
"Multi-scale document expansion for Mandarin Chinese",
In MSDR-2003, 73-78.