ISCA Archive SALTMIL 2008
ISCA Archive SALTMIL 2008

Extracting bilingual word pairs from Wikipedia

Francis M. Tyers, Jacques A. Pienaar

A bilingual dictionary or word list is an important resource for many purposes, among them, machine translation. For many language pairs these are either non-existent, or very often unavailable owing to licensing restrictions. We describe a simple, fast and computationally inexpensive method for extracting bilingual dictionary entries from Wikipedia (using the interwiki link system) and assess the performance of this method with respect to four language pairs. Precision was found to be in the 69­92% region, but open to improvement.


Cite as: Tyers, F.M., Pienaar, J.A. (2008) Extracting bilingual word pairs from Wikipedia. Proc. SALTMIL Workshop on interoperability between people in the creation of language resources for less-resourced languages, 19-22

@inproceedings{tyers08_saltmil,
  author={Francis M. Tyers and Jacques A. Pienaar},
  title={{Extracting bilingual word pairs from Wikipedia}},
  year=2008,
  booktitle={Proc. SALTMIL Workshop on interoperability between people in the creation of language resources for less-resourced languages},
  pages={19--22}
}