In automatic speech recognition on mobile devices, very often what
a user says strongly depends on the particular context he or she is
in. The n-grams relevant to the context are often not known in advance.
The context can depend on, for example, particular dialog state, options
presented to the user, conversation topic, location, etc. Speech recognition
of sentences that include these n-grams can be challenging, as they
are often not well represented in a language model (LM) or even include
out-of-vocabulary (OOV) words.
In this paper, we propose a solution for using contextual information to improve speech recognition accuracy. We utilize an on-the-fly rescoring mechanism to adjust the LM weights of a small set of n-grams relevant to the particular context during speech decoding.
Our solution handles out of vocabulary words. It also addresses efficient combination of multiple sources of context and it even allows biasing class based language models. We show significant speech recognition accuracy improvements on several datasets, using various types of contexts, without negatively impacting the overall system. The improvements are obtained in both offline and live experiments.
Bibliographic reference. Aleksic, Petar / Ghodsi, Mohammadreza / Michaely, Assaf / Allauzen, Cyril / Hall, Keith / Roark, Brian / Rybach, David / Moreno, Pedro (2015): "Bringing contextual information to google speech recognition", In INTERSPEECH-2015, 468-472.