12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs

Yanmin Qian (1), Daniel Povey (2), Jia Liu (1)

(1) Tsinghua University, China
(2) Microsoft Research, USA

Large vocabulary continuous speech recognition is always a difficult task, and it is particularly so for low-resource languages. The scenario we focus on here is having only 1 hour of acoustic training data in the "target" language. This paper presents work on a data borrowing strategy combined with the recently proposed Subspace Gaussian Mixture Model (SGMM). We developed data borrowing strategies based on two approaches: one based on minimizing K-L Divergence, and one that also takes into account state occupation counts. We demonstrate improvements versus the baseline SGMM setup, which itself is better than a conventional HMM-GMM system. The SGMMs are more robustly estimated by borrowing data from the non-target language at the acoustic-state level. Although we tested the approach for SGMMs, we expect the general idea of borrowing data from a non-target language to be applicable for conventional GMMs as well.

Full Paper

Bibliographic reference.  Qian, Yanmin / Povey, Daniel / Liu, Jia (2011): "State-level data borrowing for low-resource speech recognition based on subspace GMMs", In INTERSPEECH-2011, 553-556.