16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Multiscale Recurrent Neural Network Based Language Model

Tsuyoshi Morioka (1), Tomoharu Iwata (2), Takaaki Hori (2), Tetsunori Kobayashi (1)

(1) Waseda University, Japan
(2) NTT Corporation, Japan

We describe a novel recurrent neural network-based language model (RNNLM) dealing with multiple time-scales of contexts. The RNNLM is now a technical standard in language modeling because it remembers some lengths of contexts. However, the RNNLM can only deal with a single time-scale of a context, regardless of the subsequent words and topic of the spoken utterance, even though the optimal time-scale of the context can vary under such conditions. In contrast, our multiscale RNNLM enables incorporating with sufficient flexibility, and it makes use of various time-scales of contexts simultaneously and with proper weights for predicting the next word. Experimental comparisons carried out in large vocabulary spontaneous speech recognition demonstrate that introducing the multiple time-scales of contexts into the RNNLM yielded improvements over existing RNNLMs in terms of the perplexity and word error rate.

Full Paper

Bibliographic reference.  Morioka, Tsuyoshi / Iwata, Tomoharu / Hori, Takaaki / Kobayashi, Tetsunori (2015): "Multiscale recurrent neural network based language model", In INTERSPEECH-2015, 2366-2370.