1st Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
Porto Salvo, Portugal
In this work we explored the problem of translating the Penn Treebank corpus to Spanish. For this problem, we considered Phrase-based Machine Translation techniques. Given that there not exist parallel training data for this corpus, we used a large out-of-domain training data set, and a small “hight-quality” indomain training data set. We studied simple and effective Domain Adaptation techniques that were used for other applications. We report experiments on a small test set of sentences manually translated from the Penn Treebank corpus.
Index Terms: Penn Treebank, Machine Translation, Domain Adaptation
Bibliographic reference. Rocha, Martha Alicia / Sánchez, Joan Andreu (2009): "Machine translation of the Penn treebank to Spanish", In SLTECH-2009, 39-42.