ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Topic-based language models using EM

Daniel Gildea, Thomas Hofmann

In this paper, we propose a novel statistical language model to capture topic-related long-range dependencies. Topics are modeled in a latent variable framework in which we also derive an EM algorithm to perform a topic factor decomposition based on a segmented training corpus. The topic model is combined with a standard language model to be used for on-line word prediction. Perplexity results indicate an improvement over previously proposed topic models, which unfortunately has not translated into lower word error.


doi: 10.21437/Eurospeech.1999-479

Cite as: Gildea, D., Hofmann, T. (1999) Topic-based language models using EM. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2167-2170, doi: 10.21437/Eurospeech.1999-479

@inproceedings{gildea99_eurospeech,
  author={Daniel Gildea and Thomas Hofmann},
  title={{Topic-based language models using EM}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={2167--2170},
  doi={10.21437/Eurospeech.1999-479}
}