ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation

Holger Schwenk, Yannick Esteve

This paper gives a detailed description of a statistical machine translation system developed for the 2008 NIST open MT evaluation. The system is based on the open source toolkit Moses with extensions for language model rescoring in a second pass. Significant improvements were obtained with data selection methods for the language and translation model. An improvement of more than 1 point BLEU on the test set was achieved by a continuous space language model which performs the probability estimation with a neural network. The described system has achieved a very good ranking in the 2008 NIST open MT evaluation.


doi: 10.21437/Interspeech.2008-676

Cite as: Schwenk, H., Esteve, Y. (2008) Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation. Proc. Interspeech 2008, 2727-2730, doi: 10.21437/Interspeech.2008-676

@inproceedings{schwenk08_interspeech,
  author={Holger Schwenk and Yannick Esteve},
  title={{Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2727--2730},
  doi={10.21437/Interspeech.2008-676}
}