Densely Connected Networks for Conversational Speech Recognition

Kyu Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane


In this paper we show how we have achieved the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set. We propose densely connected LSTMs (namely, dense LSTMs), inspired by the densely connected convolutional neural networks recently introduced for image classification tasks. It is shown that the proposed dense LSTMs would provide more reliable performance as compared to the conventional, residual LSTMs as more LSTM layers are stacked in neural networks. With RNN-LM rescoring and lattice combination on the 5 systems (including 2 dense LSTM based systems) trained across three different phone sets, Capio's conversational speech recognition system has obtained 5.0% and 9.1% on Switchboard and CallHome, respectively.


 DOI: 10.21437/Interspeech.2018-1486

Cite as: Han, K., Chandrashekaran, A., Kim, J., Lane, I. (2018) Densely Connected Networks for Conversational Speech Recognition. Proc. Interspeech 2018, 796-800, DOI: 10.21437/Interspeech.2018-1486.


@inproceedings{Han2018,
  author={Kyu Han and Akshay Chandrashekaran and Jungsuk Kim and Ian Lane},
  title={Densely Connected Networks for Conversational Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={796--800},
  doi={10.21437/Interspeech.2018-1486},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1486}
}