Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition

Shiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu


The shared-hidden-layer multilingual deep neural network (SHL-MDNN), in which the hidden layers of feed-forward deep neural network (DNN) are shared across multiple languages while the softmax layers are language dependent, has been shown to be effective on acoustic modeling of multilingual low-resource speech recognition. In this paper, we propose that the shared-hidden-layer with Long Short-Term Memory (LSTM) recurrent neural networks can achieve further performance improvement considering LSTM has outperformed DNN as the acoustic model of automatic speech recognition (ASR). Moreover, we reveal that shared-hidden-layer multilingual LSTM (SHL-MLSTM) with residual learning can yield additional moderate but consistent gain from multilingual tasks given the fact that residual learning can alleviate the degradation problem of deep LSTMs. Experimental results demonstrate that SHL-MLSTM can relatively reduce word error rate (WER) by 2.1–6.8% over SHL-MDNN trained using six languages and 2.6–7.3% over monolingual LSTM trained using the language specific data on CALLHOME datasets. Additional WER reduction, about relatively 2% over SHL-MLSTM, can be obtained through residual learning on CALLHOME datasets, which demonstrates residual learning is useful for SHL-MLSTM on multilingual low-resource ASR.


 DOI: 10.21437/Interspeech.2017-111

Cite as: Zhou, S., Zhao, Y., Xu, S., Xu, B. (2017) Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition. Proc. Interspeech 2017, 704-708, DOI: 10.21437/Interspeech.2017-111.


@inproceedings{Zhou2017,
  author={Shiyu Zhou and Yuanyuan Zhao and Shuang Xu and Bo Xu},
  title={Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={704--708},
  doi={10.21437/Interspeech.2017-111},
  url={http://dx.doi.org/10.21437/Interspeech.2017-111}
}