This work describes a process to extract Named Entity (NE) translations from the text available in web links (anchor texts). It translates a NE by retrieving a list of web documents in the target language, extracting the anchor texts from the links to those documents and finding the best translation from the anchor texts, using a combination of features, some of which, are specific to anchor texts. Experiments performed on a manually built corpora, suggest that over 70% of the NEs, ranging from unpopular to popular entities, can be translated correctly using sorely anchor texts. Tests on a Machine Translation task indicate that the system can be used to improve the quality of the translations of state-of-the-art statistical machine translation systems.
Cite as: Ling, W., Calado, P., Martins, B., Trancoso, I., Black, A., Coheur, L. (2011) Named entity translation using anchor texts. Proc. International Workshop on Spoken Language Translation (IWSLT 2011), 206-213
@inproceedings{ling11_iwslt, author={Wang Ling and Pável Calado and Bruno Martins and Isabel Trancoso and Alan Black and Luísa Coheur}, title={{Named entity translation using anchor texts}}, year=2011, booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2011)}, pages={206--213} }