Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

An Information-Based Method for Selecting Feature Types for Word Prediction

Dekai Wu (1), Zhifang Sui (1,2), Jun Zhao (1)

(1) Human Language Technology Center, Department of Computer Science, University of Science & Technology, HKUST, Clear Water Bay, Hong Kong
(2) Computational Linguistics Institute, Department of Computer Science & Technology, Peking University, Beijing, China

This paper uses an information-based approach to conduct feature types selection for language modeling in a systematic manner. We describe a quantitative analysis of the information gain and the information redundancy for various combinations of feature types inspired by both dependency structure and bigram structure through analyzing an English treebank corpus and taking word prediction as the object. The experiments yield several conclusions on the predictive value of several feature types and feature types combinations for word prediction, which are expected to provide reliable reference for feature type selection in language modeling.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Wu, Dekai / Sui, Zhifang / Zhao, Jun (1999): "An information-based method for selecting feature types for word prediction", In EUROSPEECH'99, 2051-2054.