Large vocabulary continuous speech recognition is always a difficult task, and it is particularly so for low-resource languages. The scenario we focus on here is having only 1 hour of acoustic training data in the "target" language. This paper presents work on a data borrowing strategy combined with the recently proposed Subspace Gaussian Mixture Model (SGMM). We developed data borrowing strategies based on two approaches: one based on minimizing K-L Divergence, and one that also takes into account state occupation counts. We demonstrate improvements versus the baseline SGMM setup, which itself is better than a conventional HMM-GMM system. The SGMMs are more robustly estimated by borrowing data from the non-target language at the acoustic-state level. Although we tested the approach for SGMMs, we expect the general idea of borrowing data from a non-target language to be applicable for conventional GMMs as well.
Bibliographic reference. Qian, Yanmin / Povey, Daniel / Liu, Jia (2011): "State-level data borrowing for low-resource speech recognition based on subspace GMMs", In INTERSPEECH-2011, 553-556.