1st Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages

Porto Salvo, Portugal
September 3-4, 2009

Machine Translation of the Penn Treebank to Spanish

Martha Alicia Rocha (1), Joan Andreu Sánchez (2)

(1) Departamento de Sistemas y Computación, Instituto Tecnológico de León, México
(2) Instituto Tecnológico de Informática, Universidad Politécnica de Valencia, Spain

In this work we explored the problem of translating the Penn Treebank corpus to Spanish. For this problem, we considered Phrase-based Machine Translation techniques. Given that there not exist parallel training data for this corpus, we used a large out-of-domain training data set, and a small “hight-quality” indomain training data set. We studied simple and effective Domain Adaptation techniques that were used for other applications. We report experiments on a small test set of sentences manually translated from the Penn Treebank corpus.

Index Terms: Penn Treebank, Machine Translation, Domain Adaptation

Full Paper

Bibliographic reference.  Rocha, Martha Alicia / Sánchez, Joan Andreu (2009): "Machine translation of the Penn treebank to Spanish", In SLTECH-2009, 39-42.