1st Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages

Porto Salvo, Portugal
September 3-4, 2009

Terminology Extraction from English-Portuguese and English-Galician Parallel Corpora based on Probabilistic Translation Dictionaries and Bilingual Syntactic Patterns

Alberto Simões(1), Xavier Gómez Guinovart (2)

(1) Department of Computer Science, Universidade do Minho, Portugal
(2) Department of Translation and Linguistics, Universidade de Vigo, Spain

This paper presents a research on parallel corpora-based bilingual terminology extraction based on the occurrence of bilingual morphosyntactic patterns in the probabilistic translation dictionaries generated by NATools. To evaluate this method, we carried out an experiment in which both the level of lexical cohesion of the term candidates and their specificity with respect to a non-terminological corpus of the target language were taken into account. The evaluation results show a high degree of accuracy of the terminology extraction based on probabilistic translation dictionaries complemented by bilingual syntactic patterns.

Index Terms: bilingual terminology extraction, probabilistic translation dictionaries

Full Paper

Bibliographic reference.  Simõe, Alberto / Gómez Guinovart, Xavier (2009): "Terminology extraction from English-portuguese and English-galician parallel corpora based on probabilistic translation dictionaries and bilingual syntactic patterns", In SLTECH-2009, 13-16.