ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition

Weiwu Zhu

This paper demonstrates how Knowledge Graph (KG) and Search Query Click Logs (SQCL) can be leveraged in statistical language models to improve named entity recognition for online speech recognition systems. Due to the missing in the training data, some named entities may be recognized as other common words that have the similar pronunciation. KG and SQCL cover comprehensive and fresh named entities and queries that can be used to mitigate the wrong recognition. First, all the entities located in the same area in KG are clustered together, and the queries that contain the entity names are selected from SQCL as the training data of a geographical statistical language model for each entity cluster. These geographical language models make the unseen named entities less likely to occur during the model training, and can be dynamically switched according to the user location in the recognition phase. Second, if any named entities are identified in the previous utterances within a conversational dialog, the probability of the n-best word sequence paths that contain their related entities will be increased for the current utterance by utilizing the entity relationships from KG and SQCL. This way can leverage the long-term contexts within the dialog. Experiments for the proposed approach on voice queries from a spoken dialog system yielded a 12.5% relative perplexity reduction in the language model measurement, and a 1.1% absolute word error rate reduction in the speech recognition measurement.


doi: 10.21437/Interspeech.2017-1790

Cite as: Zhu, W. (2017) Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition. Proc. Interspeech 2017, 2735-2738, doi: 10.21437/Interspeech.2017-1790

@inproceedings{zhu17_interspeech,
  author={Weiwu Zhu},
  title={{Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2735--2738},
  doi={10.21437/Interspeech.2017-1790}
}