Cumulative Adaptation for BLSTM Acoustic Models

Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney


This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of learning temporal relationships and translation invariant representations, is used for robust acoustic modeling. Further, i-vectors were used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 8% relative improvement in word error rate on the NIST Hub5 2000 evaluation testset. By enhancing the first-pass i-vector based adaptation with a second-pass adaptation using speaker and environment dependent transformations within the network, a further relative improvement of 5% in word error rate was achieved. We have reevaluated the features used to estimate i-vectors and their normalization to achieve the best performance in a modern large scale automatic speech recognition system.


 DOI: 10.21437/Interspeech.2019-2162

Cite as: Kitza, M., Golik, P., Schlüter, R., Ney, H. (2019) Cumulative Adaptation for BLSTM Acoustic Models. Proc. Interspeech 2019, 754-758, DOI: 10.21437/Interspeech.2019-2162.


@inproceedings{Kitza2019,
  author={Markus Kitza and Pavel Golik and Ralf Schlüter and Hermann Ney},
  title={{Cumulative Adaptation for BLSTM Acoustic Models}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={754--758},
  doi={10.21437/Interspeech.2019-2162},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2162}
}