Recently there has been some interest in the question of how to build LVCSR systems for the low-resource languages. The scenario we focus on here is having only one hour of acoustic training data in the "target" language, but more plentiful data in other languages. This paper presents approaches using MLP based features: we construct a low-resource system with additional sources of information from the non-target languages to train the cross-lingual MLPs. A hierarchical architecture and multi-stream strategy are applied on the cross-lingual phone level, to improve the neural network more discriminatively. Additionally, an elaborate ensemble system with various acoustic feature streams and context expansion lengths is proposed. After system combination with these two strategies we get significant improvements of more than 8% absolute versus a conventional baseline in this low-resource scenario with only one hour of target training data.
Index Terms: low-resource language; cross-lingual posterior features; hierarchical architectures; ensemble system
Bibliographic reference. Qian, Yanmin / Liu, Jia (2012): "Cross-lingual and ensemble MLPs strategies for low-resource speech recognition", In INTERSPEECH-2012, 2582-2585.