International Workshop on Spoken Language Translation (IWSLT) 2004

Keihanna Science City, Kyoto, Japan
September 30-October 1, 2004

Phrase-based Alignment Combining Corpus Cooccurrences and Linguistic Knowledge

Adrià de Gispert, José B. Mariño, Josep M. Crego

TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain

This paper introduces a phrase alignment strategy that seeks phrase and word links in two stages using cooccurrence measures and linguistic information. On a first stage, the algorithm finds high-precision links involving a linguistically-derived set of phrases, leaving word alignment to be performed in a second phase. Experiments have been carried out for an English-Spanish parallel corpus, and we show how phrase cooccurrence measures convey a complementary information to word cooccurrences, and a stronger evidence of a good alignment. Alignment Error Rate (AER) results are presented, being competitive with and even outperforming state-of-the-art alignment algorithms.

Full Paper    Presentation

Bibliographic reference.  Gispert, Adrià de / Mariño, José B. / Crego, Josep M. (2004): "Phrase-based alignment combining corpus cooccurrences and linguistic knowledge", In IWSLT-2004, 107-114.