ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Sequence-based class tagging for robust transcription in ASR

Lucy Vasserman, Vlad Schogol, Keith Hall

We present a method of modeling non-lexical vocabulary items such as numbers, times, dates, monetary amounts and address components that avoids the data sparsity and out-of-vocabulary problems of written-domain language models. Like previous approaches, we use a class-based language model and efficient finite-state class grammars during run-time decoding. We mitigate the problem of context-independent replacement of class items by employing a contextual sequence labeling model to identify which class instances should be replaced, leaving others to appear in their original form. Applied to the task of general voice-search audio transcription, our method achieves 10% relative error reduction (on the numeric error rate metric) compared to the previous system (based on a verbalizer transducer). On a numeric entity recognition task, our method achieves a 23% relative error reduction on the same metric. In both cases, word error rate remains the same or is reduced.

doi: 10.21437/Interspeech.2015-178

Cite as: Vasserman, L., Schogol, V., Hall, K. (2015) Sequence-based class tagging for robust transcription in ASR. Proc. Interspeech 2015, 473-477, doi: 10.21437/Interspeech.2015-178

  author={Lucy Vasserman and Vlad Schogol and Keith Hall},
  title={{Sequence-based class tagging for robust transcription in ASR}},
  booktitle={Proc. Interspeech 2015},