Sixth European Conference on Speech Communication and Technology
A new method of using domains to improve a variable n-gram language model is presented. Previous research on the use of domains using a 2-tier approach is extended through the addition of further hierarchical levels of domain abstraction. Our method is based upon the concept of domain association where semantically related domains exhibit similar n-gram distributions. Association allows us to cluster domains into a hierarchy of larger super-domains ultimately terminating when all domains have been included into one single super-domain. The resulting hierarchy has been used within a dynamic language model framework which tracks changes in text over time. Results with several corpora show improvements of 22% over traditional modelling techniques whilst providing a robust mechanism for handling topical changes.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Donnelly, Paul G. / Smith, F. J. / Sicilia, E. / Ming, Ji (1999): "Language modelling with hierarchical domains", In EUROSPEECH'99, 1575-1578.