The majority of studies on Arabic diacritization have employed textually inferred features alone. This paper proposes a novel approach, where the weighted combination of speech with a text-based model is used to allow linguistically-insensitive acoustic information to correct and complement the errors generated by the text model's diacritic predictions. The acoustic model is based on Hidden Markov Models and the textual model on Conditional Random Fields. The combination brings significant reduction in error rates across all metrics, especially in case endings, which are the most difficult to predict. The results in this paper are the most accurate reported to date, with diacritic and word error rates of 1.5 and 4.9 inclusive of case endings, and 1.0 and 2.7 exclusive of them.
Index Terms: Arabic diacritization, case endings, multimodal systems
Bibliographic reference. Azim, Aisha S. / Wang, Xiaoxuan / Chai, Sim Khe (2012): "A weighted combination of speech with text-based models for Arabic diacritization", In INTERSPEECH-2012, 2334-2337.