Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition

Mittul Singh, Youssef Oualil, Dietrich Klakow


Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the first pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, as a second step in the decoding process. Ideally, these adapted RNNLMs should be applied for first-pass decoding. Thus, we introduce two ways of applying adapted long-short-term-memory (LSTM) based RNNLMs for first-pass decoding. Using available techniques to convert LSTMs to approximated versions for first-pass decoding, we compare approximated LSTMs adapted in a Fast Marginal Adaptation framework (FMA) and an approximated version of architecture-based-adaptation of LSTM. On a conversational speech recognition task, these differently approximated and adapted LSTMs combined with a trigram LM outperform other adapted and unadapted LMs. Here, the architecture-adapted LSTM combination obtains a 35.9% word error rate (WER) and is outperformed by FMA-based LSTM combination obtaining the overall lowest WER of 34.4%.


 DOI: 10.21437/Interspeech.2017-147

Cite as: Singh, M., Oualil, Y., Klakow, D. (2017) Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition. Proc. Interspeech 2017, 2720-2724, DOI: 10.21437/Interspeech.2017-147.


@inproceedings{Singh2017,
  author={Mittul Singh and Youssef Oualil and Dietrich Klakow},
  title={Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2720--2724},
  doi={10.21437/Interspeech.2017-147},
  url={http://dx.doi.org/10.21437/Interspeech.2017-147}
}