ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

On the use of morphological analysis for dialectal Arabic speech recognition

Mohamed Afify, Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Laurent Besacier, Yuqing Gao

Arabic has a large number of affixes that can modify a stem to form words. In automatic speech recognition (ASR) this leads to a high outof- vocabulary (OOV) rate for typical lexicon size, and hence a potential increase in WER. This is even more pronounced for dialects of Arabic where additional affixes are often introduced and the available data is typically sparse. To address this problem we introduce a simple word decomposition algorithm which only requires a text corpus and a predefined list of affixes. Using this algorithm to create the lexicon for Iraqi Arabic ASR results in about 10% relative improvement in word error rate (WER). Also using the union of the segmented and unsegmented vocabularies and interpolating the corresponding language models results in further WER reduction. The net WER improvement is about 13% relative.


doi: 10.21437/Interspeech.2006-87

Cite as: Afify, M., Sarikaya, R., Kuo, H.-K.J., Besacier, L., Gao, Y. (2006) On the use of morphological analysis for dialectal Arabic speech recognition. Proc. Interspeech 2006, paper 1444-Mon2A2O.2, doi: 10.21437/Interspeech.2006-87

@inproceedings{afify06_interspeech,
  author={Mohamed Afify and Ruhi Sarikaya and Hong-Kwang Jeff Kuo and Laurent Besacier and Yuqing Gao},
  title={{On the use of morphological analysis for dialectal Arabic speech recognition}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1444-Mon2A2O.2},
  doi={10.21437/Interspeech.2006-87}
}