ISCA Archive IWSLT 2011
ISCA Archive IWSLT 2011

Fill-up versus interpolation methods for phrase-based SMT adaptation

Arianna Bisazza, Nick Ruiz, Marcello Federico

This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training. We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair. In particular, we focus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus. We present experiments on an emerging transcribed speech translation task - the TED talks. While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training.


Cite as: Bisazza, A., Ruiz, N., Federico, M. (2011) Fill-up versus interpolation methods for phrase-based SMT adaptation. Proc. International Workshop on Spoken Language Translation (IWSLT 2011), 136-143

@inproceedings{bisazza11_iwslt,
  author={Arianna Bisazza and Nick Ruiz and Marcello Federico},
  title={{Fill-up versus interpolation methods for phrase-based SMT adaptation}},
  year=2011,
  booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2011)},
  pages={136--143}
}