We investigate two strategies to improve the context-dependent deep neural network hidden Markov model (CD-DNN-HMM) in low-resource speech recognition. Although outperforming the conventional Gaussian mixture model (GMM) HMM on various tasks, CD-DNN-HMM acoustic modeling becomes challenging with limited transcribed speech, e.g., less than 10 hours. To resolve this issue, we firstly exploit dropout which prevents overfitting in DNN finetuning and improves model robustness under data sparseness. Then, the effectiveness of multilingual DNN training is evaluated when additional auxiliary languages are available. The hidden layer parameters of the target language are shared and learned over multiple languages. Experiments show that both strategies boost the recognition performance significantly. Combining them results in further reduction in word error rate, achieving 11.6% and 6.2% relative improvement on two limited data conditions.
Bibliographic reference. Miao, Yajie / Metze, Florian (2013): "Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training", In INTERSPEECH-2013, 2237-2241.