Please join us for theory lunch this week 4/10 at 12:00 in GCS 502C. This week our speaker will be Julian Asilis who will be presenting the following work:
Title: Proper Learnability and the Role of Unlabeled Data
Abstract:
Proper learning refers to the setting in which learners must emit
predictors in the underlying hypothesis class H, and often leads to
learners with simple algorithmic forms. However, there exist problems
which can only be learned improperly, e.g. in multiclass classification.
Thus, we ask: Under what assumptions on the hypothesis class or the
information provided to the learner is a problem properly learnable?
We
first demonstrate that when the unlabeled data distribution is given,
there always exists an optimal proper learner governed by distributional
regularization, a randomized generalization of regularization. We refer
to this setting as the distribution-fixed PAC model, and continue to
evaluate the learner on its worst-case performance over all
distributions. Further, we demonstrate that sample complexities in the
distribution-fixed PAC model can shrink by only a logarithmic factor
from the classic PAC model, strongly refuting the role of unlabeled data
in PAC learning (from a worst-case perspective). We complement this
with impossibility results for the standard realizable PAC model which
demonstrate that proper learnability cannot be characterized using a
dimension. Our impossibility results all hold even for the fundamental
setting of multiclass classification, and go through a reduction of EMX
learning (Ben-David et al., 2019) to proper classification which may be
of independent interest.
Thanks,