In this study, transfer learning techniques are presented for cross-lingual speech recognition to mitigate the effects of limited availability of data in a target language using data from richly resourced source languages. A maximum likelihood (ML) based regularization criterion is used to learn context-dependent Gaussian mixture model (GMM) based hidden Markov model (HMM) parameters for phones in target language using data from both target and source languages. Recognition results indicate improved HMM state alignments. The hidden layers of a deep neural network (DNN) are then initialized using unsupervised pre-training of a multilingual deep belief network (DBN). First, the DNN is fine-tuned using a modified cross entropy criterion that jointly uses HMM state alignments from both target and source languages. Second, another DNN fine-tuning technique is explored where the training is performed in a sequential manner — source language followed by the target language. Experiments conducted using varying amounts of target data indicate improvements in performance can be obtained using joint and sequential training of the DNN compared to existing techniques. Turkish and English were chosen to be the target and source languages respectively.
Bibliographic reference. Das, Amit / Hasegawa-Johnson, Mark (2015): "Cross-lingual transfer learning during supervised training in low resource scenarios", In INTERSPEECH-2015, 3531-3535.