Symposium on Machine Learning in Speech and Language Processing (MLSLP)

Bellevue, WA, USA
June 27, 2011

Recent Advances in Active Learning

Sanjoy Dasgupta

University of California, San Diego, CA, USA

A key requirement for being able to learn a good classifier is having enough labeled data. In many situations, however, unlabeled data is easily available but labels are expensive to come by. In the active learning scenario, each label has a non-negligible cost, and the goal, starting with a large pool of unlabeled data, is to adaptively decide which points to label, so that a good classifier is obtained at low cost.
   Many active learning strategies run into severe problems with sampling bias; the theory has therefore focused on how to correctly manage this bias while attaining good label complexity. I will summarize recent work in the machine learning community that achieves this goal through algorithms that are simple and practical enough to be used in large-scale applications.


Bibliographic reference.  Dasgupta, Sanjoy (2011): "Recent advances in active learning", In MLSLP-2011.