Hybrid Arbitration Using Raw ASR String and NLU Information — Taking the Best of Both Embedded World and Cloud World

Min Tang


Hybrid arbitration is a process where we select the best Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) result from embedded/client and cloud-based system outputs. It is a common approach that a lot of real world applications use to unify knowledge sources that are not available to client and cloud at the same time. In the past, people primarily relied on ASR confidence features and some application specific heuristics in the arbitration process. However, confidence features are unable to capture subtle context specific differences. In this paper, besides confidence, we also use raw ASR strings and NLU results in the hybrid arbitration process. We model the arbitration process as two steps — first, decide whether to wait for a slower system, and second, pick the best result. We compared multiple machine learning approaches and it turns out the Deep Neural Network (DNN) based classifier, using word embeddings to process ASR strings and NLU embeddings to process NLU information, can deliver the best performance. We conducted experiments on two production system setups, using field data from real users. Compared with traditional confidence score based approach, we obtain about 30% relative word error reduction and 30% relative sentence error rate reduction.


 DOI: 10.21437/Interspeech.2019-2586

Cite as: Tang, M. (2019) Hybrid Arbitration Using Raw ASR String and NLU Information — Taking the Best of Both Embedded World and Cloud World. Proc. Interspeech 2019, 2983-2987, DOI: 10.21437/Interspeech.2019-2586.


@inproceedings{Tang2019,
  author={Min Tang},
  title={{Hybrid Arbitration Using Raw ASR String and NLU Information — Taking the Best of Both Embedded World and Cloud World}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2983--2987},
  doi={10.21437/Interspeech.2019-2586},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2586}
}