Maximum Entropy (MaxEnt) language models are powerful models that can incorporate linguistic and non-linguistic contextual signals in a unified framework with a convex loss. MaxEnt models also have the advantage of scaling to large model and training data sizes We present the following two contributions to MaxEnt training: (1) By leveraging smaller amounts of transcribed data, we demonstrate that a MaxEnt LM trained on various types of corpora can be easily adapted to better match the test distribution of Automatic Speech Recognition (ASR); (2) A novel adaptive-training approach that efficiently models multiple types of non-linguistic features in a universal model. We evaluate the impact of these approaches on Google’s state-of-the-art ASR for the task of voice-search transcription and dictation. Training 10B parameter models utilizing a corpus of up to 1T words, we show large reductions in word error rate from adaptation across multiple languages. Also, human evaluations show significant improvements on a wide range of domains from using non-linguistic features. For example, adapting to geographical domains (e.g., US States and cities) affects about 4% of test utterances, with 2:1 win to loss ratio.
Cite as: Biadsy, F., Ghodsi, M., Caseiro, D. (2017) Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals. Proc. Interspeech 2017, 2710-2714, doi: 10.21437/Interspeech.2017-1203
@inproceedings{biadsy17_interspeech, author={Fadi Biadsy and Mohammadreza Ghodsi and Diamantino Caseiro}, title={{Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2710--2714}, doi={10.21437/Interspeech.2017-1203} }