
Sixth International Conference on Spoken Language Processing
(ICSLP 2000)
Beijing, China
October 1620, 2000 

Vector Space Representation of Language Probabilities Through SVD of NGram Matrix
Shiro Terashima, Kazuya Takeda, Fumitada Itakura
Center for Integrated Acoustic Information Research (CIAIR), Nagoya University, Japan
In this paper we introduce the vector space representation of the
Ngram language model where vectors of K dimensions are given to
both words and contexts, i.e., an N1 word sequence, so that the
scalar product of a ‘word vector’ and a ‘context vector’ gives the
corresponding Ngram probability. The vector space
representation is obtained from singular value decomposition
(SVD) of the cooccurrence frequency matrix (CFM) of the
context and the word. The effectiveness of the proposed
representation is examined by determining how the number of
Ngram parameters can be reduced through clustering and
truncation of the dimensions defined on the given vector space.
From the experimental results, it is confirmed that the number of
model parameters can be reduced to less than 17.5% of the original
number of model parameters and the proposed method is more
effective than the word clustering method based on mutual
information.
Full Paper
Bibliographic reference.
Terashima, Shiro / Takeda, Kazuya / Itakura, Fumitada (2000):
"Vector space representation of language probabilities through SVD of ngram matrix",
In ICSLP2000, vol.2, 995998.