ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Strategies for building a Farsi-English SMT system from limited resources

Andreas Kathol, Jing Zheng

One of the recent tasks for machine translation research has been development of translation capabilities in a time frame as short as 100 days. Such a task requires developers to consider what can be done with relatively small amounts of data in a small time frame. This inherently limits the type and complexity of the effort to be devoted to this task. In this paper we will focus on the kinds of improvements for a Farsi-to-English translation system achieved by means of algorithmic changes, adding raw, domain-unspecific resources, and unsupervised morphological segmentation. The cumulative effect of these measures has been an improvement in BLEU scores of about 25% relative on an internal test set.


doi: 10.21437/Interspeech.2008-677

Cite as: Kathol, A., Zheng, J. (2008) Strategies for building a Farsi-English SMT system from limited resources. Proc. Interspeech 2008, 2731-2734, doi: 10.21437/Interspeech.2008-677

@inproceedings{kathol08_interspeech,
  author={Andreas Kathol and Jing Zheng},
  title={{Strategies for building a Farsi-English SMT system from limited resources}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2731--2734},
  doi={10.21437/Interspeech.2008-677}
}