Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Language Modelling with Hierarchical Domains

Paul G. Donnelly, F. J. Smith, E. Sicilia, Ji Ming

School of Computer Science, The Queen’s University of Belfast, Belfast, UK

A new method of using domains to improve a variable n-gram language model is presented. Previous research on the use of domains using a 2-tier approach is extended through the addition of further hierarchical levels of domain abstraction. Our method is based upon the concept of domain association where semantically related domains exhibit similar n-gram distributions. Association allows us to cluster domains into a hierarchy of larger super-domains ultimately terminating when all domains have been included into one single super-domain. The resulting hierarchy has been used within a dynamic language model framework which tracks changes in text over time. Results with several corpora show improvements of 22% over traditional modelling techniques whilst providing a robust mechanism for handling topical changes.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Donnelly, Paul G. / Smith, F. J. / Sicilia, E. / Ming, Ji (1999): "Language modelling with hierarchical domains", In EUROSPEECH'99, 1575-1578.