This paper presents a comparative study of bottle-neck feature (BNF) and Deep Neural Network (DNN) multilingual training for low-resource language speech recognition. Besides, we also compared the system performances after fine-tuning. The evaluation was conducted on the IARPA Babel data. The source languages are Cantonese, Pashto, Tagalog, and Turkish, while the target languages are Vietnamese and Tamil. As compared to the monolingual baseline systems, the BNF and DNN methods similarly achieved relative WER reductions of 3.0-5.1% on the Limited Language Pack (LLP) data and 6.1-8.6% on the Very LLP (VLLP) data. By fine-tuning, the BNF method further gained significant performance improvement, while the DNN method obtained marginal gains. Overall, we observed the DNN method performs worse on the smaller size data (VLLP), as well as on the noisier data (Tamil) of the target languages, in the cross-lingual transfer and fine-tuning cases respectively.
Bibliographic reference. Xu, Haihua / Do, Van Hai / Xiao, Xiong / Chng, Eng Siong (2015): "A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognition", In INTERSPEECH-2015, 2132-2136.