15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Recent Advances in ASR Applied to an Arabic Transcription System for Al-Jazeera

Patrick Cardinal (1), Ahmed Ali (2), Najim Dehak (1), Yu Zhang (1), Tuka Al Hanai (1), Yifan Zhang (2), James R. Glass (1), Stephan Vogel (2)

(1) MIT, USA
(2) Qatar Computing Research Institute, Qatar

This paper describes a detailed comparison of several state-of-the-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vector-based speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteria combined with sequential Deep Neural Network (DNN) training. We report results for two different types of test data: broadcast news reports, with a best word error rate (WER) of 17.86%, and a broadcast conversations with a best WER of 29.85%. The overall WER on this test set is 25.6%.

Full Paper

Bibliographic reference.  Cardinal, Patrick / Ali, Ahmed / Dehak, Najim / Zhang, Yu / Hanai, Tuka Al / Zhang, Yifan / Glass, James R. / Vogel, Stephan (2014): "Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera", In INTERSPEECH-2014, 2088-2092.