In this paper, we propose a novel statistical language model to capture topic-related long-range dependencies. Topics are modeled in a latent variable framework in which we also derive an EM algorithm to perform a topic factor decomposition based on a segmented training corpus. The topic model is combined with a standard language model to be used for on-line word prediction. Perplexity results indicate an improvement over previously proposed topic models, which unfortunately has not translated into lower word error.
Cite as: Gildea, D., Hofmann, T. (1999) Topic-based language models using EM. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2167-2170, doi: 10.21437/Eurospeech.1999-479
@inproceedings{gildea99_eurospeech, author={Daniel Gildea and Thomas Hofmann}, title={{Topic-based language models using EM}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2167--2170}, doi={10.21437/Eurospeech.1999-479} }