Jesse Krijthe | 07-01-2018 | Robust semi-supervised learning: projections, limits & constraints
In many domains of science and society, the amount of data being gathered is increasing rapidly. To estimate input-output relationships that are often of interest, supervised learning techniques rely on a specific type of data: labeled examples for which we know both the input and an outcome. The problem of semi-supervised learning is how to use, increasingly abundantly available, unlabeled examples, with unknown outcomes, to improve supervised learning methods. This thesis is concerned with the question if and how these improvements are possible in a “robust”, or safe, way: can we guarantee these methods do not lead to worse performance than the supervised solution?
We show that for some supervised classifiers, most notably, the least squares classifier, semi-supervised adaptations can be constructed where this non-degradation in performance can indeed be guaranteed, in terms of the surrogate loss used by the classifier. Since these guarantees are given in terms of the surrogate loss, we explore why this is a useful criterion to evaluate performance. We then prove that semi-supervised versions with strict non-degradation guarantees are not possible for a large class of commonly used supervised classifiers. Other aspects covered in the thesis include optimistic learning, the peaking phenomenon and reproducibility.