Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the first pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, as a second step in the decoding process. Ideally, these adapted RNNLMs should be applied for first-pass decoding. Thus, we introduce two ways of applying adapted long-short-term-memory (LSTM) based RNNLMs for first-pass decoding. Using available techniques to convert LSTMs to approximated versions for first-pass decoding, we compare approximated LSTMs adapted in a Fast Marginal Adaptation framework (FMA) and an approximated version of architecture-based-adaptation of LSTM. On a conversational speech recognition task, these differently approximated and adapted LSTMs combined with a trigram LM outperform other adapted and unadapted LMs. Here, the architecture-adapted LSTM combination obtains a 35.9% word error rate (WER) and is outperformed by FMA-based LSTM combination obtaining the overall lowest WER of 34.4%.
Cite as: Singh, M., Oualil, Y., Klakow, D. (2017) Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition. Proc. Interspeech 2017, 2720-2724, doi: 10.21437/Interspeech.2017-147
@inproceedings{singh17_interspeech, author={Mittul Singh and Youssef Oualil and Dietrich Klakow}, title={{Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2720--2724}, doi={10.21437/Interspeech.2017-147} }