Neural networks have become increasingly popular for the task of language modeling.
Whereas feed-forward networks only exploit a fixed
context length to predict the next word of a sequence, conceptually, standard recurrent
neural networks can take into account all of the
predecessor words. On the other hand, it is well known that recurrent networks are
difficult to train and therefore are unlikely to show the full
potential of recurrent models.
These problems are addressed by a the Long Short-Term Memory neural network architecture. In this work, we apply this type of network to an English and a large French language modeling task. Experiments show improvements of about 8% relative in perplexity over standard recurrent neural network LMs. In addition, we gain considerable improvements in WER on top of a state-of-the-art speech recognition system.
Index Terms: language modeling, recurrent neural networks, LSTM neural networks
Bibliographic reference. Sundermeyer, Martin / Schlüter, Ralf / Ney, Hermann (2012): "LSTM neural networks for language modeling", In INTERSPEECH-2012, 194-197.