8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

On Latent Semantic Language Modeling and Smoothing

Jen-Tzung Chien, Meng-Sung Wu, Hua-Jui Peng

National Cheng Kung University, Taiwan

Language modeling plays a critical role for automatic speech recognition. Conventionally, the n-gram language models suffer from lacking good representation of historical words and estimating unseen parameters from insufficient training data. In this work, the latent semantic information is explored for language modeling and parameter smoothing. In language modeling, we present a new representation of historical words via retrieving the most likely relevance document. Besides, we also develop a novel parameter smoothing method where the language models of seen and unseen words are estimated by interpolating those of k nearest seen words in training corpus. The interpolation coefficients are determined according to the closeness of words in semantic space. In the experiments, the proposed modeling and smoothing methods can significantly reduce the perplexities of language models with moderate computation cost.

Full Paper

Bibliographic reference.  Chien, Jen-Tzung / Wu, Meng-Sung / Peng, Hua-Jui (2004): "On latent semantic language modeling and smoothing", In INTERSPEECH-2004, 1373-1376.