We describe a novel recurrent neural network-based language model (RNNLM) dealing with multiple time-scales of contexts. The RNNLM is now a technical standard in language modeling because it remembers some lengths of contexts. However, the RNNLM can only deal with a single time-scale of a context, regardless of the subsequent words and topic of the spoken utterance, even though the optimal time-scale of the context can vary under such conditions. In contrast, our multiscale RNNLM enables incorporating with sufficient flexibility, and it makes use of various time-scales of contexts simultaneously and with proper weights for predicting the next word. Experimental comparisons carried out in large vocabulary spontaneous speech recognition demonstrate that introducing the multiple time-scales of contexts into the RNNLM yielded improvements over existing RNNLMs in terms of the perplexity and word error rate.
Bibliographic reference. Morioka, Tsuyoshi / Iwata, Tomoharu / Hori, Takaaki / Kobayashi, Tetsunori (2015): "Multiscale recurrent neural network based language model", In INTERSPEECH-2015, 2366-2370.