International Workshop on Spoken Language Translation (IWSLT) 2011
San Francisco, CA, USA
When building a university lecture translation system,
one important step is to adapt it to the target domain. One
problem in this adaptation task is to acquire translations for
domain specific terms. In this approach we tried to get these
translations from Wikipedia, which provides articles on very
specific topics in many different languages. To extract translations
for the domain specific terms, we used the interlanguage
links of Wikipedia.
We analyzed different methods to integrate this corpus into our system and explored methods to disambiguate between different translations by using the text of the articles. In addition, we developed methods to handle different morphological forms of the specific terms in morphologically rich input languages like German. The results show that the number of out-of-vocabulary (OOV) words could be reduced by 50% on computer science lectures and the translation quality could be improved by more than 1 BLEU point.
Bibliographic reference. Niehues, Jan / Waibel, Alex (2011): "Using Wikipedia to translate domain-specific terms in SMT", In IWSLT-2011, 230-237.