15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Improving ASR Performance On Non-Native Speech Using Multilingual and Crosslingual Information

Ngoc Thang Vu, Yuanfan Wang, Marten Klose, Zlatka Mihaylova, Tanja Schultz

KIT, Germany

This paper presents our latest investigation of automatic speech recognition (ASR) on non-native speech. We first report on a non-native speech corpus — an extension of the GlobalPhone database — which contains English with Bulgarian, Chinese, German and Indian accent and German with Chinese accent. In this case, English is the spoken language ( L2) and Bulgarian, Chinese, German and Indian are the mother tongues ( L1) of the speakers. Afterwards, we investigate the effect of multilingual acoustic modeling on non-native speech. Our results reveal that a bilingual L1-L2 acoustic model significantly improves the ASR performance on non-native speech. For the case that L1 is unknown or L1 data is not available, a multilingual ASR system trained without L1 speech data consistently outperforms the monolingual L2 ASR system. Finally, we propose a method called crosslingual accent adaptation, which allows using English with Chinese accent to improve the German ASR on German with Chinese accent and vice versa. Without using any intra lingual adaptation data, we achieve 15.8% relative improvement in average over the baseline system.

Full Paper

Bibliographic reference.  Vu, Ngoc Thang / Wang, Yuanfan / Klose, Marten / Mihaylova, Zlatka / Schultz, Tanja (2014): "Improving ASR performance on non-native speech using multilingual and crosslingual information", In INTERSPEECH-2014, 11-15.