Reducing the Computational Complexity of Two-Dimensional LSTMs

Bo Li, Tara N. Sainath

Long Short-Term Memory Recurrent Neural Networks (LSTMs) are good at modeling temporal variations in speech recognition tasks, and have become an integral component of many state-of-the-art ASR systems. More recently, LSTMs have been extended to model variations in the speech signal in two dimensions, namely time and frequency [1, 2]. However, one of the problems with two-dimensional LSTMs, such as Grid-LSTMs, is that the processing in both time and frequency occurs sequentially, thus increasing computational complexity. In this work, we look at minimizing the dependence of the Grid-LSTM with respect to previous time and frequency points in the sequence, thus reducing computational complexity. Specifically, we compare reducing computation using a bidirectional Grid-LSTM (biGrid-LSTM) with non-overlapping frequency sub-band processing, a PyraMiD-LSTM [3] and a frequency-block Grid-LSTM (fbGrid-LSTM) for parallel time-frequency processing. We find that the fbGrid-LSTM can reduce computation costs by a factor of four with no loss in accuracy, on a 12,500 hour Voice Search task.

 DOI: 10.21437/Interspeech.2017-1164

Cite as: Li, B., Sainath, T.N. (2017) Reducing the Computational Complexity of Two-Dimensional LSTMs. Proc. Interspeech 2017, 964-968, DOI: 10.21437/Interspeech.2017-1164.

  author={Bo Li and Tara N. Sainath},
  title={Reducing the Computational Complexity of Two-Dimensional LSTMs},
  booktitle={Proc. Interspeech 2017},