Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition

Ke Li, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur


We propose two adaptation models for recurrent neural network language models (RNNLMs) to capture topic effects and long-distance triggers for conversational automatic speech recognition (ASR). We use a fast marginal adaptation (FMA) framework to adapt a RNNLM. Our first model is effectively a cache model - the word frequencies are estimated by counting words in a conversation (with utterance-level hold-one-out) from 1st-pass decoded word lattices and then is interpolated with a background unigram distribution. In the second model, we train a deep neural network (DNN) on conversational transcriptions to predict word frequencies given word frequencies from 1st-pass decoded word lattices. The second model can in principle model trigger and topic effects but is harder to train. Experiments on three conversational corpora show modest WER and perplexity reductions with both adaptation models.


 DOI: 10.21437/Interspeech.2018-1413

Cite as: Li, K., Xu, H., Wang, Y., Povey, D., Khudanpur, S. (2018) Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition. Proc. Interspeech 2018, 3373-3377, DOI: 10.21437/Interspeech.2018-1413.


@inproceedings{Li2018,
  author={Ke Li and Hainan Xu and Yiming Wang and Daniel Povey and Sanjeev Khudanpur},
  title={Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3373--3377},
  doi={10.21437/Interspeech.2018-1413},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1413}
}