16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Bringing Contextual Information to Google Speech Recognition

Petar Aleksic, Mohammadreza Ghodsi, Assaf Michaely, Cyril Allauzen, Keith Hall, Brian Roark, David Rybach, Pedro Moreno

Google, USA

In automatic speech recognition on mobile devices, very often what a user says strongly depends on the particular context he or she is in. The n-grams relevant to the context are often not known in advance. The context can depend on, for example, particular dialog state, options presented to the user, conversation topic, location, etc. Speech recognition of sentences that include these n-grams can be challenging, as they are often not well represented in a language model (LM) or even include out-of-vocabulary (OOV) words.
    In this paper, we propose a solution for using contextual information to improve speech recognition accuracy. We utilize an on-the-fly rescoring mechanism to adjust the LM weights of a small set of n-grams relevant to the particular context during speech decoding.
    Our solution handles out of vocabulary words. It also addresses efficient combination of multiple sources of context and it even allows biasing class based language models. We show significant speech recognition accuracy improvements on several datasets, using various types of contexts, without negatively impacting the overall system. The improvements are obtained in both offline and live experiments.

Full Paper

Bibliographic reference.  Aleksic, Petar / Ghodsi, Mohammadreza / Michaely, Assaf / Allauzen, Cyril / Hall, Keith / Roark, Brian / Rybach, David / Moreno, Pedro (2015): "Bringing contextual information to google speech recognition", In INTERSPEECH-2015, 468-472.