Most conventional semi-supervised learning methods attempt to directly include unlabeled data into training objectives. This paper presents an alternative approach that learns feature affinity information from unlabeled data, which is incorporated into the training objective as regularization of a maximum entropy model. The regularization favors models for which correlated features have similar weights. The method is evaluated in text classification, where feature affinity can be computed from feature co-occurrences in unlabeled data. Experimental results show that this method consistently outperforms baseline methods.
Index Terms: semi-supervised learning, text classification, maximum entropy, feature affinity matrix, regularization
Cite as: Zhang, B., Ostendorf, M. (2012) Semi-supervised learning for text classification using feature affinity regularization. Proc. Machine Learning in Speech and Language Processing (MLSLP 2012), 26-29
@inproceedings{zhang12_mlslp, author={Bin Zhang and Mari Ostendorf}, title={{Semi-supervised learning for text classification using feature affinity regularization}}, year=2012, booktitle={Proc. Machine Learning in Speech and Language Processing (MLSLP 2012)}, pages={26--29} }