Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition

Shiliang Zhang, Hui Jiang, Shifu Xiong, Si Wei, Li-Rong Dai


In acoustic modeling for large vocabulary continuous speech recognition, it is essential to model long term dependency within speech signals. Usually, recurrent neural network (RNN) architectures, especially the long short term memory (LSTM) models, are the most popular choice. Recently, a novel architecture, namely feedforward sequential memory networks (FSMN), provides a non-recurrent architecture to model long term dependency in sequential data and has achieved better performance over RNNs on acoustic modeling and language modeling tasks. In this work, we propose a compact feedforward sequential memory networks (cFSMN) by combining FSMN with low-rank matrix factorization. We also make a slight modification to the encoding method used in FSMNs in order to further simplify the network architecture. On the Switchboard task, the proposed new cFSMN structures can reduce the model size by 60% and speed up the learning by more than 7 times while the models still significantly outperform the popular bidirection LSTMs for both frame-level cross-entropy (CE) criterion based training and MMI based sequence training.


DOI: 10.21437/Interspeech.2016-121

Cite as

Zhang, S., Jiang, H., Xiong, S., Wei, S., Dai, L. (2016) Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition. Proc. Interspeech 2016, 3389-3393.

Bibtex
@inproceedings{Zhang+2016,
author={Shiliang Zhang and Hui Jiang and Shifu Xiong and Si Wei and Li-Rong Dai},
title={Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-121},
url={http://dx.doi.org/10.21437/Interspeech.2016-121},
pages={3389--3393}
}