Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

On the Use of Morphological Analysis for Dialectal Arabic Speech Recognition

Mohamed Afify, Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Laurent Besacier, Yuqing Gao

IBM T.J. Watson Research Center, USA

Arabic has a large number of affixes that can modify a stem to form words. In automatic speech recognition (ASR) this leads to a high outof- vocabulary (OOV) rate for typical lexicon size, and hence a potential increase in WER. This is even more pronounced for dialects of Arabic where additional affixes are often introduced and the available data is typically sparse. To address this problem we introduce a simple word decomposition algorithm which only requires a text corpus and a predefined list of affixes. Using this algorithm to create the lexicon for Iraqi Arabic ASR results in about 10% relative improvement in word error rate (WER). Also using the union of the segmented and unsegmented vocabularies and interpolating the corresponding language models results in further WER reduction. The net WER improvement is about 13% relative.

