Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition

Jack Serrino, Leonid Velikovich, Petar Aleksic, Cyril Allauzen


As voice-driven intelligent assistants become commonplace, adaptation to user context becomes critical for Automatic Speech Recognition (ASR) systems. For example, ASR systems may be expected to recognize a user’s contact names containing improbable or out-of-vocabulary (OOV) words.

We introduce a method to identify contextual cues in a first-pass ASR system’s output and to recover out-of-lattice hypotheses that are contextually relevant. Our proposed module is agnostic to the architecture of the underlying recognizer, provided it generates a word lattice of hypotheses; it is sufficiently compact for use on device. The module identifies subgraphs in the lattice likely to contain named entities (NEs), recovers phoneme hypotheses over corresponding time spans, and inserts NEs that are phonetically close to those hypotheses. We measure a decrease in the mean word error rate (WER) of word lattices from 11.5% to 4.9% on a test set of NEs.


 DOI: 10.21437/Interspeech.2019-2962

Cite as: Serrino, J., Velikovich, L., Aleksic, P., Allauzen, C. (2019) Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition. Proc. Interspeech 2019, 3830-3834, DOI: 10.21437/Interspeech.2019-2962.


@inproceedings{Serrino2019,
  author={Jack Serrino and Leonid Velikovich and Petar Aleksic and Cyril Allauzen},
  title={{Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3830--3834},
  doi={10.21437/Interspeech.2019-2962},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2962}
}