Useful training data for automatic speech recognition systems of colloquial speech is usually limited to expensive in-domain transcription. Broadcast news is an appealing source of easily available data to bootstrap into a new dialect. However, some languages, like Arabic, have deep linguistic differences resulting in poor cross domain performance. If no in-domain transcripts are available, but a large amount of in-domain audio is, self-training may be a suitable technique to bootstrap into the domain. In this work, we attempt to adapt Modern Standard Arabic (MSA) models to Levantine Arabic without any in-domain manual transcription. We contrast with varying amounts of in-domain transcription and show that 1) Self-training is effective with only one hour of in-domain transcripts. 2) Self-training is not a suitable solution to improve strong MSA models on Levantine. 3) Two metrics that quantify model bias predict self-training success. 4) Model bias explains the failure of self-training to adapt across strong domain mismatch.
Bibliographic reference. Novotney, Scott / Schwartz, Rich / Khudanpur, Sanjeev (2011): "Unsupervised Arabic dialect adaptation with self-training", In INTERSPEECH-2011, 541-544.