Third Workshop on Spoken Language Technologies for Under-resourced Languages
Cape Town, South Africa
This paper discusses the preliminary experiment conducted to translate from English to Amharic using the Statistical Machine Translation (EASMT) approach. The experiment on the EASMT system is being conducted on training corpus of both languages based on expressions that are found in parallel documents. The experiment involves collecting of a total of 632 Parliamentary corpora of which 115 have been used in the experiment. The corpus coverage is 15 years from Aug 21, 1995 to July 16, 2010. Each document contains data, which are translations of each other. The experiment has been conducted using 18,432 English-Amharic sentence pairs extracted from these corpora in order to measure the accuracy of the translation system. Accordingly, the baseline phrase-based BLEU score result is 35.32%. A 0.34% increase in BLEU has been achieved by applying morpheme segmentation to the tokens of the Amharic output result and the reference of the baseline system. The increase is 0.92% when compared with the same segmented reference between the baseline and the segmented system.
Index Terms: Statistical Machine Translation, Parallel Corpus, Word Segmentation
Bibliographic reference. Teshome, Mulu Gebreegziabher / Besacier, Laurent (2012): "Preliminary experiments on English-Amharic statistical machine translation", In SLTU-2012, 36-41.