Daily TMLR digest for Nov 17, 2022

2 views

Skip to first unread message

TMLR

unread,

Nov 16, 2022, 7:00:10 PM11/16/22

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Teacher’s pet: understanding and mitigating biases in distillation

Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Abstract: Knowledge distillation is widely used as a means of improving the performance of a relatively simple ``student'' model using the predictions from a complex ``teacher'' model. Several works have shown that distillation significantly boosts the student's \emph{overall} performance; however, are these gains uniform across all data subgroups? In this paper, we show that distillation can \emph{harm} performance on certain subgroups, {e.g., classes with few associated samples}, compared to the vanilla student trained using the one-hot labels. We trace this behaviour to errors made by the teacher distribution being transferred to and \emph{amplified} by the student model, and formally prove that distillation can indeed harm underrepresented subgroups in certain regression settings. To mitigate this problem, we present techniques which soften the teacher influence for subgroups where it is less reliable. Experiments on several image classification benchmarks show that these modifications of distillation maintain boost in overall accuracy, while additionally ensuring improvement in subgroup performance.

URL: https://openreview.net/forum?id=ph3AYXpwEb

---

New submissions
===============

Title: Optimal Threshold Labeling for Ordinal Regression Methods

Abstract: For an ordinal regression task, a classification task for ordinal data, one-dimensional transformation (1DT)-based methods are often employed since they are considered to capture the ordinal relation of ordinal data well. They learn a 1DT of the observation of the explanatory variables so that an observation with a larger class label tends to have a larger value of the 1DT, and classify the observation by labeling that learned 1DT. In this paper, we study the labeling procedure for 1DT-based methods, which have not been sufficiently discussed in existing studies. While regression-based methods and classical threshold methods conventionally use threshold labelings, which label a learned 1DT according to the rank of the interval to which the 1DT belongs among intervals on the real line separated by threshold parameters, we prove that likelihood-based labeling used in popular statistical 1DT-based methods is also a threshold labeling in typical usages. Moreover, we show that these threshold labelings can be sub-optimal ones depending on the learning result of the 1DT and the task under consideration. On the basis of these findings, we propose to apply empirical optimal threshold labeling, which is a threshold labeling that uses threshold parameters minimizing the empirical task risk for a learned 1DT, to those methods. In experiments with real-world datasets, changing the labeling procedure of existing 1DT-based methods to the proposed one improved the classification performance in many tried cases.

URL: https://openreview.net/forum?id=mHSAy1n65Z

---

Title: L-SVRG and L-Katyusha with Adaptive Sampling

Abstract: Stochastic gradient-based optimization methods, such as L-SVRG and its accelerated variant L-Katyusha (Kovalev et al., 2020), are widely used to train machine learning models. Theoretical and empirical performance of L-SVRG and L-Katyusha can be improved by sampling the observations from a non-uniform distribution Qian et al. (2021). However, to design a desired sampling distribution, Qian et al. (2021) rely on prior knowledge of smoothness constants that can be computationally intractable to obtain in practice when the dimension of the model parameter is high. We propose an adaptive sampling strategy for L-SVRG and L-Katyusha that learns the sampling distribution with little computational overhead, while allowing it to change with iterates, and at the same time does not require any prior knowledge on the problem parameters. We prove convergence guarantees for L-SVRG and L-Katyusha for convex objectives when the sampling distribution changes with iterates. These results show that even without prior information, the proposed adaptive sampling strategy matches, and in some cases even surpasses, the performance of the sampling scheme in Qian et al. (2021). Extensive simulations support our theory and the practical utility of the proposed sampling scheme on real data.

URL: https://openreview.net/forum?id=9lyqt3rbDc

---

Title: Probing Predictions on OOD Images via Nearest Categories

Abstract: We study out-of-distribution (OOD) prediction behavior of neural networks when they classify images from unseen classes or corrupted images. To probe the OOD behavior, we introduce a new measure, nearest category generalization (NCG), where we compute the fraction of OOD inputs that are classified with the same label as their nearest neighbor in the training set. Our motivation stems from understanding the prediction patterns of adversarially robust networks, since previous work has identified unexpected consequences of training to be robust to norm-bounded perturbations. We find that robust networks have consistently higher NCG accuracy than natural training, even when the OOD data is much farther away than the robustness radius. This implies that the local regularization of robust training has a significant impact on the network’s decision regions. We replicate our findings using many datasets, comparing new and existing training methods. Overall, adversarially robust networks resemble a nearest neighbor classifier when it comes to OOD data.

URL: https://openreview.net/forum?id=fTNorIvVXG

---

Reply all

Reply to author

Forward

0 new messages