This paper presents an English-Iraqi Arabic speech-to-speech statistical machine translation system using limited resources. In it, we explore the constraints involved, how we endeavored to mitigate such problems as a non-standard orthography and a highly inflected grammar, and discuss leveraging existing plentiful resources for Modern Standard Arabic to assist in this task. These combined techniques yield a reduction in unknown words at translation time by over 40% and a +3.65 increase in BLEU score over a previous state-of-the-art system using the same parallel training corpus of spoken utterances.
Cite as: Riesa, J., Mohit, B., Knight, K., Marcu, D. (2006) Building an English-iraqi Arabic machine translation system for spoken utterances with limited resources. Proc. Interspeech 2006, paper 2012-Tue1A1O.1, doi: 10.21437/Interspeech.2006-261
@inproceedings{riesa06_interspeech, author={Jason Riesa and Behrang Mohit and Kevin Knight and Daniel Marcu}, title={{Building an English-iraqi Arabic machine translation system for spoken utterances with limited resources}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 2012-Tue1A1O.1}, doi={10.21437/Interspeech.2006-261} }