8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

A Two Phase Arabic Language Model for Speech Recognition And Other Language Applications

Mohsen Rashwan

Cairo UNIV, Egypt

A new language model for Arabic language for large vocabulary automatic speech recognition (ASR) is introduced. The derivative future of the Arabic word is quite useful in dividing the process into two phases. In phase-1 the fixed words, the prefix, the suffix and the form of the derivative words are determined through phase-1M-gram, of course, given the acoustical data. In phase 2 another M-gram is used to determine the roots of the derivative words. The idea was tested on 60 words (10 roots x 6 forms). Results are encouraging the idea, and more work is to follow to realize a complete large vocabulary ASR for Arabic language.

Full Paper

Bibliographic reference.  Rashwan, Mohsen (2004): "A two phase arabic language model for speech recognition and other language applications", In INTERSPEECH-2004, 1041-1044.