Empirical Exploration of Novel Architectures and Objectives for Language Models

Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon


While recurrent neural network language models based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks, Convolutional Neural Network (CNN) language models are relatively new and have not been studied in-depth. In this paper we present an empirical comparison of LSTM and CNN language models on English broadcast news and various conversational telephone speech transcription tasks. We also present a new type of CNN language model that leverages dilated causal convolution to efficiently exploit long range history. We propose a novel criterion for training language models that combines word and class prediction in a multi-task learning framework. We apply this criterion to train word and character based LSTM language models and CNN language models and show that it improves performance. Our results also show that CNN and LSTM language models are complementary and can be combined to obtain further gains.


 DOI: 10.21437/Interspeech.2017-723

Cite as: Kurata, G., Sethy, A., Ramabhadran, B., Saon, G. (2017) Empirical Exploration of Novel Architectures and Objectives for Language Models. Proc. Interspeech 2017, 279-283, DOI: 10.21437/Interspeech.2017-723.


@inproceedings{Kurata2017,
  author={Gakuto Kurata and Abhinav Sethy and Bhuvana Ramabhadran and George Saon},
  title={Empirical Exploration of Novel Architectures and Objectives for Language Models},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={279--283},
  doi={10.21437/Interspeech.2017-723},
  url={http://dx.doi.org/10.21437/Interspeech.2017-723}
}