Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition

Wei-Ning Hsu, Yu Zhang, Ann Lee, James Glass


Deep neural network models have achieved considerable success in a wide range of fields. Several architectures have been proposed to alleviate the vanishing gradient problem, and hence enable training of very deep networks. In the speech recognition area, convolutional neural networks, recurrent neural networks, and fully connected deep neural networks have been shown to be complimentary in their modeling capabilities. Combining all three components, called CLDNN, yields the best performance to date. In this paper, we extend the CLDNN model by introducing a highway connection between LSTM layers, which enables direct information flow from cells of lower layers to cells of upper layers. With this design, we are able to better exploit the advantages of a deeper structure. Experiments on the GALE Chinese Broadcast Conversation/News Speech dataset indicate that our model outperforms all previous models and achieves a new benchmark, which is 22.41% character error rate on the dataset.


DOI: 10.21437/Interspeech.2016-515

Cite as

Hsu, W., Zhang, Y., Lee, A., Glass, J. (2016) Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition. Proc. Interspeech 2016, 395-399.

Bibtex
@inproceedings{Hsu+2016,
author={Wei-Ning Hsu and Yu Zhang and Ann Lee and James Glass},
title={Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-515},
url={http://dx.doi.org/10.21437/Interspeech.2016-515},
pages={395--399}
}