Neural Error Corrective Language Models for Automatic Speech Recognition

Tomohiro Tanaka, Ryo Masumura, Hirokazu Masataki, Yushi Aono

We present novel neural network based language models that can correct automatic speech recognition (ASR) errors by using speech recognizer output as a context. These models, called neural error corrective language models (NECLMs), utilizes ASR hypotheses of a target utterance as a context for estimating the generative probability of words. NECLMs are expressed as conditional generative models composed of an encoder network and a decoder network. In the models, the encoder network constructs context vectors from N-best lists and ASR confidence scores generated in a speech recognizer. The decoder network rescores recognition hypotheses by computing a generative probability of words using the context vectors so as to correct ASR errors. We evaluate the proposed models in Japanese lecture ASR tasks. Experimental results show that NECLM achieve better ASR performance than a state-of-the-art ASR system that incorporate a convolutional neural network acoustic model and a long short-term memory recurrent neural network language model.

 DOI: 10.21437/Interspeech.2018-1430

Cite as: Tanaka, T., Masumura, R., Masataki, H., Aono, Y. (2018) Neural Error Corrective Language Models for Automatic Speech Recognition. Proc. Interspeech 2018, 401-405, DOI: 10.21437/Interspeech.2018-1430.

  author={Tomohiro Tanaka and Ryo Masumura and Hirokazu Masataki and Yushi Aono},
  title={Neural Error Corrective Language Models for Automatic Speech Recognition},
  booktitle={Proc. Interspeech 2018},