12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Unsupervised Arabic Dialect Adaptation with Self-Training

Scott Novotney (1), Rich Schwartz (1), Sanjeev Khudanpur (2)

(1) Raytheon BBN Technologies, USA
(2) Johns Hopkins University, USA

Useful training data for automatic speech recognition systems of colloquial speech is usually limited to expensive in-domain transcription. Broadcast news is an appealing source of easily available data to bootstrap into a new dialect. However, some languages, like Arabic, have deep linguistic differences resulting in poor cross domain performance. If no in-domain transcripts are available, but a large amount of in-domain audio is, self-training may be a suitable technique to bootstrap into the domain. In this work, we attempt to adapt Modern Standard Arabic (MSA) models to Levantine Arabic without any in-domain manual transcription. We contrast with varying amounts of in-domain transcription and show that 1) Self-training is effective with only one hour of in-domain transcripts. 2) Self-training is not a suitable solution to improve strong MSA models on Levantine. 3) Two metrics that quantify model bias predict self-training success. 4) Model bias explains the failure of self-training to adapt across strong domain mismatch.

Full Paper

Bibliographic reference.  Novotney, Scott / Schwartz, Rich / Khudanpur, Sanjeev (2011): "Unsupervised Arabic dialect adaptation with self-training", In INTERSPEECH-2011, 541-544.