International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

Using Wikipedia to Translate Domain-specific Terms in SMT

Jan Niehues, Alex Waibel

Institute for Anthropomatics, Karlsruhe Institute of Technology, Germany

When building a university lecture translation system, one important step is to adapt it to the target domain. One problem in this adaptation task is to acquire translations for domain specific terms. In this approach we tried to get these translations from Wikipedia, which provides articles on very specific topics in many different languages. To extract translations for the domain specific terms, we used the interlanguage links of Wikipedia.
   We analyzed different methods to integrate this corpus into our system and explored methods to disambiguate between different translations by using the text of the articles. In addition, we developed methods to handle different morphological forms of the specific terms in morphologically rich input languages like German. The results show that the number of out-of-vocabulary (OOV) words could be reduced by 50% on computer science lectures and the translation quality could be improved by more than 1 BLEU point.

Full Paper

Bibliographic reference.  Niehues, Jan / Waibel, Alex (2011): "Using Wikipedia to translate domain-specific terms in SMT", In IWSLT-2011, 230-237.