We describe a novel recurrent neural network-based language model (RNNLM) dealing with multiple time-scales of contexts. The RNNLM is now a technical standard in language modeling because it remembers some lengths of contexts. However, the RNNLM can only deal with a single time-scale of a context, regardless of the subsequent words and topic of the spoken utterance, even though the optimal time-scale of the context can vary under such conditions. In contrast, our multiscale RNNLM enables incorporating with sufficient flexibility, and it makes use of various time-scales of contexts simultaneously and with proper weights for predicting the next word. Experimental comparisons carried out in large vocabulary spontaneous speech recognition demonstrate that introducing the multiple time-scales of contexts into the RNNLM yielded improvements over existing RNNLMs in terms of the perplexity and word error rate.
Cite as: Morioka, T., Iwata, T., Hori, T., Kobayashi, T. (2015) Multiscale recurrent neural network based language model. Proc. Interspeech 2015, 2366-2370, doi: 10.21437/Interspeech.2015-512
@inproceedings{morioka15_interspeech, author={Tsuyoshi Morioka and Tomoharu Iwata and Takaaki Hori and Tetsunori Kobayashi}, title={{Multiscale recurrent neural network based language model}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={2366--2370}, doi={10.21437/Interspeech.2015-512} }