We introduce a technique for dynamically applying contextually-derived language models to a state-of-the-art speech recognition system. These generally small-footprint models can be seen as a generalization of cache-based models [1], whereby contextually salient n-grams are derived from relevant sources (not just user generated language) to produce a model intended for combination with the baseline language model. The derived models are applied during first-pass decoding as a form of on-the-fly composition between the decoder search graph and the set of weighted contextual n-grams. We present a construction algorithm which takes a trie representing the contextual n-grams and produces a weighted finite state automaton which is more compact than a standard n-gram machine. Finally, we present a set of empirical results on the recognition of spoken search queries where a contextual model encoding recent trending queries is applied using the proposed technique.
Cite as: Hall, K., Cho, E., Allauzen, C., Beaufays, F., Coccaro, N., Nakajima, K., Riley, M., Roark, B., Rybach, D., Zhang, L. (2015) Composition-based on-the-fly rescoring for salient n-gram biasing. Proc. Interspeech 2015, 1418-1422, doi: 10.21437/Interspeech.2015-340
@inproceedings{hall15_interspeech, author={Keith Hall and Eunjoon Cho and Cyril Allauzen and Françoise Beaufays and Noah Coccaro and Kaisuke Nakajima and Michael Riley and Brian Roark and David Rybach and Linda Zhang}, title={{Composition-based on-the-fly rescoring for salient n-gram biasing}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={1418--1422}, doi={10.21437/Interspeech.2015-340} }