ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Improved language modelling using bag of word pairs

Langzhou Chen, K. K. Chin, Kate Knill

The bag-of-words (BoW) method has been used widely in language modelling and information retrieval. A document is expressed as a group of words disregarding the grammar and the order of word information. A typical BoW method is latent semantic analysis (LSA), which maps the words and documents onto the vectors in LSA space. In this paper, the concept of BoW is extended to Bag-of-Word Pairs (BoWP), which expresses the document as a group of word pairs. Using word pairs as a unit, the system can capture more complex semantic information than BoW. Under the LSA framework, the BoWP system is shown to improve both perplexity and word error rate (WER) compared to a BoW system.

doi: 10.21437/Interspeech.2009-121

Cite as: Chen, L., Chin, K.K., Knill, K. (2009) Improved language modelling using bag of word pairs. Proc. Interspeech 2009, 2671-2674, doi: 10.21437/Interspeech.2009-121

  author={Langzhou Chen and K. K. Chin and Kate Knill},
  title={{Improved language modelling using bag of word pairs}},
  booktitle={Proc. Interspeech 2009},