This paper focuses on speech recognition applications where there is a limited amount of manually labelled training data in the target language, but plentiful unlabelled data. We investigate approaches based on unsupervised training: following the traditional method, we proposed a more effective and efficient data selection principle considering confidence scores as well as phone frequency. In addition, we transfer the HMM-based unsupervised training to MLP feature level at the first time, and obtain much more robust MLP-based features. Taking into account that HMM or MLP based unsupervised trainings are focused on model or feature level of speech recognition systems, we combined these two approaches finally, and proposed a more optimized strategy to get further improved unsupervised trained system in the low-resource applications. In our experiments, we get significant improvements of about 12% relative versus a conventional baseline in this lowresource scenario.
Bibliographic reference. Qian, Yanmin / Liu, Jia (2013): "MLP-HMM two-stage unsupervised training for low-resource languages on conversational telephone speech recognition", In INTERSPEECH-2013, 1816-1820.