12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Rapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training

Ngoc Thang Vu, Franziska Kraus, Tanja Schultz

KIT, Germany

This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in the six source languages English, French, German, Spanish, Bulgarian and Polish to build from scratch an ASR system for Vietnamese, an under-resourced language. System building was performed without using any transcribed audio data by applying three consecutive steps, i.e. cross-language transfer, unsupervised training based on the "multilingual A-stabil" confidence score [1], and bootstrapping. We investigated the correlation between performance of "multilingual A-stabil" and the number of source languages and improved the performance of "multilingual A-stabil" by applying it at the syllable level. Furthermore, we showed that increasing the amount of source language ASR systems for the multilingual framework results in better performance of the final ASR system in the target language Vietnamese. The final Vietnamese recognition system has a Syllable Error Rate (SyllER) of 16.8% on the development set and 16.1% on the evaluation set.


  1. N. T. Vu, F. Kraus and T. Schultz. Multilingual A-stabil: A new confidence score for multilingual unsupervised training. In IEEE Workshop on Spoken Language Technology, SLT 2010, Berkeley, California, USA, 2010.

Full Paper

Bibliographic reference.  Vu, Ngoc Thang / Kraus, Franziska / Schultz, Tanja (2011): "Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training", In INTERSPEECH-2011, 3145-3148.