15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Building Resources for Algerian Arabic Dialects

S. Harrat (1), K. Meftouh (2), M. Abbas (3), K. Smaili (4)

(1) ENS Bouzareah, Algeria
(2) UBMA, Algeria
(3) CRSTDLA, Algeria
(4) LORIA, France

The Algerian Arabic dialects are under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We aim through this paper, and for the first time, to build parallel corpora for Algerian dialects, because our ultimate purpose is to achieve a Machine Translation (MT) for Modern Standard Arabic (MSA) and Algerian dialects (AD), in both directions. We also propose language tools to process these dialects. First, we developed a morphological analysis model of dialects by adapting BAMA, a well-known MSA analyzer. Then we propose a diacritization system, based on a MT process which allows to restore the vowels to dialects corpora. And finally, we propose results on machine translation between MSA and Algerian dialects.

Full Paper

Bibliographic reference.  Harrat, S. / Meftouh, K. / Abbas, M. / Smaili, K. (2014): "Building resources for Algerian Arabic dialects", In INTERSPEECH-2014, 2123-2127.