We present a new method for estimating the sparse non-negative model
(SNM) by using a small amount of held-out data and the multinomial
loss that is natural for language modeling; we validate it experimentally
against the previous estimation method which uses leave-one-out on
training data and a binary loss function and show that it performs
equally well. Being able to train on held-out data is very important
in practical situations where training data is mismatched from held-out/test
data. We find that fairly small amounts of held-out data (on the order
of 30–70 thousand words) are sufficient for training the adjustment
model, which is the only model component estimated using gradient descent;
the bulk of model parameters are relative frequencies counted on training
data.
A second contribution is a comparison between SNM and the related
class of Maximum Entropy language models. While much cheaper computationally,
we show that SNM achieves slightly better perplexity results for the
same feature set and same speech recognition accuracy on voice search
and short message dictation.
Cite as: Chelba, C., Caseiro, D., Biadsy, F. (2017) Sparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap. Proc. Interspeech 2017, 2725-2729, doi: 10.21437/Interspeech.2017-493
@inproceedings{chelba17_interspeech, author={Ciprian Chelba and Diamantino Caseiro and Fadi Biadsy}, title={{Sparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2725--2729}, doi={10.21437/Interspeech.2017-493} }