15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages

František Grézl, Martin Karafiát

Brno University of Technology, Czech Republic

Multilingual training of neural networks for ASR is widely studied these days. It has been shown that languages with little training data can benefit largely from the multilingual resources for training. The use of unlabeled data for the neural network training in semi-supervised manner has also improved the ASR system performance. Here, the combination of both methods is presented. First, multilingual training is performed to obtain an ASR system to automatically transcribe the unlabeled data. Then, the automatically transcribed data are added. Two neural networks are trained — one from random initialization and one adapted from multilingual network — to evaluate the effect of multilingual training under presence of larger amount of training data. Further, the CMLLR transform is applied in the middle of the stacked Bottle-Neck neural network structure. As the CMLLR rotates the features to better fit given model, we evaluated whether it is better to adapt the existing NN on the CMLLR features or if it is better to train it from random initialization. The last step in our training procedure is the fine-tuning on the original data.

Full Paper

Bibliographic reference.  Grézl, František / Karafiát, Martin (2014): "Combination of multilingual and semi-supervised training for under-resourced languages", In INTERSPEECH-2014, 820-824.