ISCA Archive IWSLT 2012
ISCA Archive IWSLT 2012

Sparse lexicalised features and topic adaptation for SMT

Eva Hasler, Barry Haddow, Philipp Koehn

We present a new approach to domain adaptation for SMT that enriches standard phrase-based models with lexicalised word and phrase pair features to help the model select appropriate translations for the target domain (TED talks). In addition, we show how source-side sentence-level topics can be incorporated to make the features differentiate between more fine-grained topics within the target domain (topic adaptation). We compare tuning our sparse features on a development set versus on the entire in-domain corpus and introduce a new method of porting them to larger mixed-domain models. Experimental results show that our features improve performance over a MIRA baseline and that in some cases we can get additional improvements with topic features. We evaluate our methods on two language pairs, English-French and German-English, showing promising results.


Cite as: Hasler, E., Haddow, B., Koehn, P. (2012) Sparse lexicalised features and topic adaptation for SMT. Proc. International Workshop on Spoken Language Translation (IWSLT 2012), 268-275

@inproceedings{hasler12b_iwslt,
  author={Eva Hasler and Barry Haddow and Philipp Koehn},
  title={{Sparse lexicalised features and topic adaptation for SMT}},
  year=2012,
  booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2012)},
  pages={268--275}
}