11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Novel Weighting Scheme for Unsupervised Language Model Adaptation Using Latent Dirichlet Allocation

Md. Akmal Haidar, Douglas O'Shaughnessy

INRS-EMT, Canada

A new approach for computing weights of topic models in language model (LM) adaptation is introduced. We formed topic clusters by a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The new weighting idea is that the unigram count of the topic generated by hard-clustering is used to compute the mixture weights instead of using an LDA latent topic word count used in the literature. Our approach shows significant perplexity and word error rate (WER) reduction against the existing approach.

Full Paper

Bibliographic reference.  Haidar, Md. Akmal / O'Shaughnessy, Douglas (2010): "Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation", In INTERSPEECH-2010, 2438-2441.