Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals

Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro


Maximum Entropy (MaxEnt) language models are powerful models that can incorporate linguistic and non-linguistic contextual signals in a unified framework with a convex loss. MaxEnt models also have the advantage of scaling to large model and training data sizes We present the following two contributions to MaxEnt training: (1) By leveraging smaller amounts of transcribed data, we demonstrate that a MaxEnt LM trained on various types of corpora can be easily adapted to better match the test distribution of Automatic Speech Recognition (ASR); (2) A novel adaptive-training approach that efficiently models multiple types of non-linguistic features in a universal model. We evaluate the impact of these approaches on Google’s state-of-the-art ASR for the task of voice-search transcription and dictation. Training 10B parameter models utilizing a corpus of up to 1T words, we show large reductions in word error rate from adaptation across multiple languages. Also, human evaluations show significant improvements on a wide range of domains from using non-linguistic features. For example, adapting to geographical domains (e.g., US States and cities) affects about 4% of test utterances, with 2:1 win to loss ratio.


 DOI: 10.21437/Interspeech.2017-1203

Cite as: Biadsy, F., Ghodsi, M., Caseiro, D. (2017) Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals. Proc. Interspeech 2017, 2710-2714, DOI: 10.21437/Interspeech.2017-1203.


@inproceedings{Biadsy2017,
  author={Fadi Biadsy and Mohammadreza Ghodsi and Diamantino Caseiro},
  title={Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2710--2714},
  doi={10.21437/Interspeech.2017-1203},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1203}
}