QMDIS: QCRI-MIT Advanced Dialect Identification System

Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai, Yonatan Belinkov, James Glass


As a continuation of our efforts towards tackling the problem of spoken Dialect Identification (DID) for Arabic languages, we present the QCRI-MIT Advanced Dialect Identification System (QMDIS). QMDIS is an automatic spoken DID system for Dialectal Arabic (DA). In this paper, we report a comprehensive study of the three main components used in the spoken DID task: phonotactic, lexical and acoustic. We use Support Vector Machines (SVMs), Logistic Regression (LR) and Convolutional Neural Networks (CNNs) as backend classifiers throughout the study. We perform all our experiments on a publicly available dataset and present new state-of-the-art results. QMDIS discriminates between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic (MSA).We report ≈73% accuracy for system combination. All the data and the code used in our experiments are publicly available for research.


 DOI: 10.21437/Interspeech.2017-1391

Cite as: Khurana, S., Najafian, M., Ali, A., Hanai, T.A., Belinkov, Y., Glass, J. (2017) QMDIS: QCRI-MIT Advanced Dialect Identification System. Proc. Interspeech 2017, 2591-2595, DOI: 10.21437/Interspeech.2017-1391.


@inproceedings{Khurana2017,
  author={Sameer Khurana and Maryam Najafian and Ahmed Ali and Tuka Al Hanai and Yonatan Belinkov and James Glass},
  title={QMDIS: QCRI-MIT Advanced Dialect Identification System},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2591--2595},
  doi={10.21437/Interspeech.2017-1391},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1391}
}