9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Cross-Lingual Sentence Extraction for Information Distillation

Adish Kumar Singla, Dilek Hakkani-Tür


Information distillation aims to analyze and interpret large volumes of speech and text archives in multiple languages and produce structured information of interest to the user. In this work, we investigate cross-lingual information distillation, where non-English (source language) documents are searched for user queries that are in English (target language). We propose to perform distillation both on the original source language data and their English translations output by machine translation, and combine the two outputs. We experimentally show that combination approach results in 8% to 16% absolute (13% to 31% relative) F-measure improvement over the previous work.

Full Paper

Bibliographic reference.  Singla, Adish Kumar / Hakkani-Tür, Dilek (2008): "Cross-lingual sentence extraction for information distillation", In INTERSPEECH-2008, 2707-2710.