INTERSPEECH 2006 - ICSLP
Arabic has a large number of affixes that can modify a stem to form words. In automatic speech recognition (ASR) this leads to a high outof- vocabulary (OOV) rate for typical lexicon size, and hence a potential increase in WER. This is even more pronounced for dialects of Arabic where additional affixes are often introduced and the available data is typically sparse. To address this problem we introduce a simple word decomposition algorithm which only requires a text corpus and a predefined list of affixes. Using this algorithm to create the lexicon for Iraqi Arabic ASR results in about 10% relative improvement in word error rate (WER). Also using the union of the segmented and unsegmented vocabularies and interpolating the corresponding language models results in further WER reduction. The net WER improvement is about 13% relative.
Bibliographic reference. Afify, Mohamed / Sarikaya, Ruhi / Kuo, Hong-Kwang Jeff / Besacier, Laurent / Gao, Yuqing (2006): "On the use of morphological analysis for dialectal Arabic speech recognition", In INTERSPEECH-2006, paper 1444-Mon2A2O.2.