INTERSPEECH 2004 - ICSLP
Language modeling plays a critical role for automatic speech recognition. Conventionally, the n-gram language models suffer from lacking good representation of historical words and estimating unseen parameters from insufficient training data. In this work, the latent semantic information is explored for language modeling and parameter smoothing. In language modeling, we present a new representation of historical words via retrieving the most likely relevance document. Besides, we also develop a novel parameter smoothing method where the language models of seen and unseen words are estimated by interpolating those of k nearest seen words in training corpus. The interpolation coefficients are determined according to the closeness of words in semantic space. In the experiments, the proposed modeling and smoothing methods can significantly reduce the perplexities of language models with moderate computation cost.
Bibliographic reference. Chien, Jen-Tzung / Wu, Meng-Sung / Peng, Hua-Jui (2004): "On latent semantic language modeling and smoothing", In INTERSPEECH-2004, 1373-1376.