This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in the six source languages English, French, German, Spanish, Bulgarian and Polish to build from scratch an ASR system for Vietnamese, an under-resourced language. System building was performed without using any transcribed audio data by applying three consecutive steps, i.e. cross-language transfer, unsupervised training based on the "multilingual A-stabil" confidence score , and bootstrapping. We investigated the correlation between performance of "multilingual A-stabil" and the number of source languages and improved the performance of "multilingual A-stabil" by applying it at the syllable level. Furthermore, we showed that increasing the amount of source language ASR systems for the multilingual framework results in better performance of the final ASR system in the target language Vietnamese. The final Vietnamese recognition system has a Syllable Error Rate (SyllER) of 16.8% on the development set and 16.1% on the evaluation set.
Bibliographic reference. Vu, Ngoc Thang / Kraus, Franziska / Schultz, Tanja (2011): "Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training", In INTERSPEECH-2011, 3145-3148.