16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

A Comparative Study of BNF and DNN Multilingual Training on Cross-Lingual Low-Resource Speech Recognition

Haihua Xu, Van Hai Do, Xiong Xiao, Eng Siong Chng

TL@NTU, Singapore

This paper presents a comparative study of bottle-neck feature (BNF) and Deep Neural Network (DNN) multilingual training for low-resource language speech recognition. Besides, we also compared the system performances after fine-tuning. The evaluation was conducted on the IARPA Babel data. The source languages are Cantonese, Pashto, Tagalog, and Turkish, while the target languages are Vietnamese and Tamil. As compared to the monolingual baseline systems, the BNF and DNN methods similarly achieved relative WER reductions of 3.0-5.1% on the Limited Language Pack (LLP) data and 6.1-8.6% on the Very LLP (VLLP) data. By fine-tuning, the BNF method further gained significant performance improvement, while the DNN method obtained marginal gains. Overall, we observed the DNN method performs worse on the smaller size data (VLLP), as well as on the noisier data (Tamil) of the target languages, in the cross-lingual transfer and fine-tuning cases respectively.

Full Paper

Bibliographic reference.  Xu, Haihua / Do, Van Hai / Xiao, Xiong / Chng, Eng Siong (2015): "A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognition", In INTERSPEECH-2015, 2132-2136.