14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Fitting Long-Range Information Using Interpolated Distanced N-Grams and Cache Models into a Latent Dirichlet Language Model for Speech Recognition

Md. Akmal Haidar, Douglas O'Shaughnessy

INRS-EMT, Canada

We propose a language modeling (LM) approach using interpolated distanced n-grams into a latent Dirichlet language model (LDLM) for speech recognition. The LDLM relaxes the bag-of-words assumption and document topic extraction of latent Dirichlet allocation (LDA). It uses default background n-grams where topic information is extracted from the (n-1) history words through Dirichlet distribution in calculating n-gram probabilities. The model does not capture the long-range information from outside of the n-gram events that can improve the language modeling performance. In this paper, we present an interpolated LDLM (ILDLM) by using different distanced n-grams. Here, the topic information is exploited from (n-1) history words through the Dirichlet distribution using interpolated distanced n-grams. The n-gram probabilities of the model are computed by using the distanced word probabilities for the topics and the interpolated topic information for the histories. In addition, we incorporate a cache-based LM, which models the re-occurring words, through unigram scaling to adapt the LDLM and ILDLM models that model the topical words. We have seen that our approaches give significant reductions in perplexity and word error rate (WER) over the probabilistic latent semantic analysis (PLSA) and LDLM approaches using the Wall Street Journal (WSJ) corpus.

Full Paper

Bibliographic reference.  Haidar, Md. Akmal / O'Shaughnessy, Douglas (2013): "Fitting long-range information using interpolated distanced n-grams and cache models into a latent dirichlet language model for speech recognition", In INTERSPEECH-2013, 2678-2682.