This paper describes a detailed comparison of several state-of-the-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vector-based speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteria combined with sequential Deep Neural Network (DNN) training. We report results for two different types of test data: broadcast news reports, with a best word error rate (WER) of 17.86%, and a broadcast conversations with a best WER of 29.85%. The overall WER on this test set is 25.6%.
Bibliographic reference. Cardinal, Patrick / Ali, Ahmed / Dehak, Najim / Zhang, Yu / Hanai, Tuka Al / Zhang, Yifan / Glass, James R. / Vogel, Stephan (2014): "Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera", In INTERSPEECH-2014, 2088-2092.