Conversational telephone speech (CTS) collections of Arabic dialects distributed trough the Linguistic Data Consortium (LDC) provide an invaluable resource for the development of robust speech systems including speaker and speech recognition, translation, spoken dialogue modeling, and information summarization. They are frequently relied on also in language (LID) and dialect identification (DID) evaluations. The first part of this study attempts to identify the source of the relatively high DID performance on LDCfs Arabic CTS corpora seen in recent literature. It is found that recordings of each dialect exhibit unique channel and noise characteristics and that silence regions are sufficient for performing reasonably accurate DID. The second part focuses on phonotactic dialect modeling that utilizes phone recognizers and support vector machines (PRSVM). New N-gram normalization of PRSVM input supervectors is introduced and shown to outperform the standard approach used in current LID and DID systems.
Index Terms: Arabic dialect identification, channel characteristics, LDC corpora, PRSVM
Bibliographic reference. Bořil, Hynek / Sangwan, Abhijeet / Hansen, John H. L. (2012): "Arabic dialect identification - "is the secret in the silence?" and other observations", In INTERSPEECH-2012, 30-33.