Symposium on Machine Learning in Speech and Language Processing (MLSLP)
Bellevue, WA, USA
A key requirement for being able to learn a good classifier is having enough
labeled data. In many situations, however, unlabeled data is easily available but
labels are expensive to come by. In the active learning scenario, each label has a
non-negligible cost, and the goal, starting with a large pool of unlabeled data,
is to adaptively decide which points to label, so that a good classifier is
obtained at low cost.
Many active learning strategies run into severe problems with sampling bias; the theory has therefore focused on how to correctly manage this bias while attaining good label complexity. I will summarize recent work in the machine learning community that achieves this goal through algorithms that are simple and practical enough to be used in large-scale applications.
Bibliographic reference. Dasgupta, Sanjoy (2011): "Recent advances in active learning", In MLSLP-2011.