7th International Conference on Spoken Language Processing
September 16-20, 2002
When we automatically acquire translation knowledge from a bilingual corpus, redundant rules are generated due to translation variety. To overcome this problem, we propose bilingual corpus cleaning based on translation literality. Word-level correspondence and phrase-level correspondence are applied as the criteria of literality. Using these criteria, a bilingual corpus was cleaned, and translation knowledge for a pattern-based MT system was acquired from the cleaned corpus. As a result, the translation quality of the MT was improved despite reductions in the the corpus size to about 81% and 87% by using word-level and phrase-level literality scores, respectively.
Bibliographic reference. Imamura, Kenji / Sumita, Eiichiro (2002): "Bilingual corpus cleaning focusing on translation literality", In ICSLP-2002, 1713-1716.