Natural language is rich and varied, but also highly structured. The rules of
grammar are a primary source of linguistic regularity, but there are many other
factors that govern patterns of language use. Language models attempt to capture
linguistic regularities, typically by modeling the statistics of word use, thereby
folding in some aspects of grammar and style. Spoken language is an important
and interesting subset of natural language that is temporally and spatially
grounded. While time and space may directly contribute to a speaker's choice
of words, they may also serve as indicators for communicative intent or other
contextual and situational factors.
To investigate the value of spatial and temporal information, we build a series of language models using a large, naturalistic corpus of spatially and temporally coded speech collected from a home environment. We incorporate this extralinguistic information by building spatiotemporal word classifiers that are mixed with traditional unigram and bigram models. Our evaluation shows that both perplexity and word error rate can be significantly improved by incorporating this information in a simple framework. The underlying principles of this work could be applied in a wide range of scenarios in which temporal or spatial information is available.
Bibliographic reference. Roy, Brandon C. / Vosoughi, Soroush / Roy, Deb (2014): "Grounding language models in spatiotemporal context", In INTERSPEECH-2014, 2625-2629.