To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories

Hengguan Huang, Brian Mak


This paper investigates a novel multiple-history long short-term memory (MH-LSTM) RNN acoustic model to mitigate the robustness problem of noisy outputs in the form of mis-labeled data and/or mis-alignments. Conceptually, after an RNN is unfolded in time, the hidden units in each layer are re-arranged into ordered sub-layers with a master sub-layer on top and a set of auxiliary sub-layers below it. Only the master sub-layer generates outputs for the next layer whereas the auxiliary sub-layers run in parallel with the master sub-layer but with increasing time lags. Each sub-layer also receives higher-order feedback from a fixed number of sub-layers below it. As a result, each sub-layer maintains a different history of the input speech, and the ensemble of all the different histories lends itself to the model’s robustness. The higher-order connections not only provide shorter feedback paths for error signals to propagate to the farther preceding hidden states to better model the long-term memory, but also more feedback paths to each model parameter and smooth its update during training. Phoneme recognition results on both real TIMIT data as well as synthetic TIMIT data with noisy labels or alignments show that the new model outperforms the conventional LSTM RNN model.


 DOI: 10.21437/Interspeech.2017-1315

Cite as: Huang, H., Mak, B. (2017) To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories. Proc. Interspeech 2017, 3862-3866, DOI: 10.21437/Interspeech.2017-1315.


@inproceedings{Huang2017,
  author={Hengguan Huang and Brian Mak},
  title={To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3862--3866},
  doi={10.21437/Interspeech.2017-1315},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1315}
}