International Workshop on Spoken Language Translation (IWSLT) 2012

Hong Kong
December 6-7, 2012

Sparse Lexicalised Features and Topic Adaptation for SMT

Eva Hasler, Barry Haddow, Philipp Koehn

University of Edinburgh, UK

We present a new approach to domain adaptation for SMT that enriches standard phrase-based models with lexicalised word and phrase pair features to help the model select appropriate translations for the target domain (TED talks). In addition, we show how source-side sentence-level topics can be incorporated to make the features differentiate between more fine-grained topics within the target domain (topic adaptation). We compare tuning our sparse features on a development set versus on the entire in-domain corpus and introduce a new method of porting them to larger mixed-domain models. Experimental results show that our features improve performance over a MIRA baseline and that in some cases we can get additional improvements with topic features. We evaluate our methods on two language pairs, English-French and German-English, showing promising results.

Full Paper   

Bibliographic reference.  Hasler, Eva / Haddow, Barry / Koehn, Philipp (2012): "Sparse lexicalised features and topic adaptation for SMT", In IWSLT-2012, 268-275.