International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

How Good Are Your Phrases? Assessing Phrase Quality with Single Class Classification

Nadi Tomeh (1), Marco Turchi (2), Guillaume Wisinewski (1), Alexandre Allauzen (1), François Yvon (1)

(1) LIMSI-CNRS and Université Paris-Sud, Orsay, France
(2) European Commission - Joint Research Centre, Ispra, Italy

We present a novel translation quality informed procedure for both extraction and scoring of phrase pairs in PBSMT systems.
   We reformulate the extraction problem in the supervised learning framework. Our goal is twofold. First, We attempt to take the translation quality into account; and second we incorporating arbitrary features in order to circumvent alignment errors. One-Class SVMs and the Mapping Convergence algorithm permit training a single-class classifier to discriminate between useful and useless phrase pairs. Such classifier can be learned from a training corpus that comprises only useful instances. The confidence score, produced by the classifier for each phrase pairs, is employed as a selection criteria. The smoothness of these scores allow a fine control over the size of the resulting translation model. Finally, confidence scores provide a new accuracy-based feature to score phrase pairs.
   Experimental evaluation of the method shows accurate assessments of phrase pairs quality even for regions in the space of possible phrase pairs that are ignored by other approaches. This enhanced evaluation of phrase pairs leads to improvements in the translation performance as measured by BLEU.

Full Paper

Bibliographic reference.  Tomeh, Nadi / Turchi, Marco / Wisinewski, Guillaume / Allauzen, Alexandre / Yvon, François (2011): "How good are your phrases? assessing phrase quality with single class classification", In IWSLT-2011, 261-268.