International Workshop on Spoken Language Translation (IWSLT) 2012
We describe several experiments to better understand the usefulness of statistical post-edition (SPE) to improve phrasebased statistical MT (PBMT) systems raw outputs. Whatever the size of the training corpus, we show that SPE systems trained on general domain data offers no breakthrough to our baseline general domain PBMT system. However, using manually post-edited system outputs to train the SPE led to a slight improvement in the translations quality compared with the use of professional reference translations. We also show that SPE is far more effective for domain adaptation, mainly because it recovers a lot of specific terms unknown to our general PBMT system. Finally, we compare two domain adaptation techniques, post-editing a general domain PBMT system vs building a new domain-adapted PBMT system with two different techniques, and show that the latter outperforms the first one. Yet, when the PBMT is a black box, SPE trained with post-edited system outputs remains an interesting option for domain adaptation.
Bibliographic reference. Potet, Marion / Besacier, Laurent / Blanchon, Hervé / Azouzi, Marwen (2012): "Towards a better understanding of statistical post-edition usefulness", In IWSLT-2012, 284-291.