We present an effort to perform topic mixture-based language model adaptation using latent Dirichlet allocation (LDA). We use probabilistic latent semantic analysis (PLSA) to automatically cluster a heterogeneous training corpus, and train an LDA model using the resultant topic-document assignments. Using this LDA model, we then construct topic-specific corpora at the utterance level for interpolation with a background language model during language model adaptation. We also present a novel iterative algorithm for LDA topic inference. Very encouraging results were obtained in preliminary experiments with broadcast news in Mandarin Chinese.
Bibliographic reference. Heidel, Aaron / Chang, Hung-an / Lee, Lin-shan (2007): "Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm", In INTERSPEECH-2007, 2361-2364.