Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling

Yuanyuan Zhao, Shuang Xu, Bo Xu


Theoretical and empirical evidences indicate that the depth of neural networks is crucial to acoustic modeling in speech recognition tasks. Unfortunately, the situation in practice always is that with the depth increasing, the accuracy gets saturated and then degrades rapidly. In this paper, a novel multidimensional residual learning architecture is proposed to address this degradation of deep recurrent neural networks (RNNs) on acoustic modeling by further exploring the spatial and temporal dimensions. In the spatial dimension, shortcut connections are introduced to RNNs, along which the information can flow across several layers without attenuation. In the temporal dimension, we cope with the degradation problem by regulating temporal granularity, namely, splitting the input sequence into several parallel sub-sequences, which can ensure information flowing across the time axis unimpededly. Finally, we place a row convolution layer on the top of all recurrent layers to comprehend appropriate information from several parallel sub-sequences to feed to the classifier. Experiments are illustrated on two quite different speech recognition tasks and 10% relative performance improvements are observed.


DOI: 10.21437/Interspeech.2016-677

Cite as

Zhao, Y., Xu, S., Xu, B. (2016) Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling. Proc. Interspeech 2016, 3419-3423.

Bibtex
@inproceedings{Zhao+2016,
author={Yuanyuan Zhao and Shuang Xu and Bo Xu},
title={Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-677},
url={http://dx.doi.org/10.21437/Interspeech.2016-677},
pages={3419--3423}
}