ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Part-of-speech n-gram and word n-gram fused language model

Hirofumi Yamamoto, Yoshinori Sagisaka

In th is paper, an accurate and com pact language model is proposed to cope robustly with data sparseness and task dependencies. This language model adopts new categories which are generated by continuously interpolating POS word-class categories and word categories using M A P estimation. The new categories can reflect word statistics efficiently without loosing accuracy and task-independent general word-characteristics (i.e. grammatical constraints captured by POS statistics) are embedded to prevent task-overtuning. This modeling reduces the model size to 50% of the conventional models. T he bi-directional word-cluster N-grams generated by this modeling have 3% lower perplexity measured on a matched domain and 15% lower on a mismatched domain compared to a conventi onal word 2-gram.


doi: 10.21437/Eurospeech.1999-362

Cite as: Yamamoto, H., Sagisaka, Y. (1999) Part-of-speech n-gram and word n-gram fused language model. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1803-1806, doi: 10.21437/Eurospeech.1999-362

@inproceedings{yamamoto99_eurospeech,
  author={Hirofumi Yamamoto and Yoshinori Sagisaka},
  title={{Part-of-speech n-gram and word n-gram fused language model}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={1803--1806},
  doi={10.21437/Eurospeech.1999-362}
}