Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition

Yike Zhang, Pengyuan Zhang, Yonghong Yan


Recurrent neural network language models (RNN LMs) trained via the maximum likelihood principle suffer from the exposure bias problem in the inference stage. Therefore, potential recognition errors limit their performance on re-scoring N-best lists of the speech recognition outputs. Inspired by the generative adversarial net (GAN), this paper proposes a novel approach to alleviate this problem. We regard the RNN LM as a generative model in the training stage. And an auxiliary neural critic is used to encourage the RNN LM to learn long-term dependencies from corrupted contexts by forcing it generating valid sentences. Since the vanilla GAN has limitations when generating discrete sequences, the proposed framework is optimized though the policy gradient algorithm. Experiments were conducted on two mandarin speech recognition tasks. Results show the proposed method achieved lower character error rates on both datasets compared with the maximum likelihood method, whereas it increased perplexities slightly. Finally, we visualised the sentences generated from the RNN LM. Results demonstrate the proposed method really helps the RNN LM to learn long-term dependencies and alleviates the exposure bias problem.


 DOI: 10.21437/Interspeech.2018-1111

Cite as: Zhang, Y., Zhang, P., Yan, Y. (2018) Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition. Proc. Interspeech 2018, 3348-3352, DOI: 10.21437/Interspeech.2018-1111.


@inproceedings{Zhang2018,
  author={Yike Zhang and Pengyuan Zhang and Yonghong Yan},
  title={Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3348--3352},
  doi={10.21437/Interspeech.2018-1111},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1111}
}