Daily TMLR digest for Feb 27, 2023

1 view
Skip to first unread message

TMLR

unread,
Feb 26, 2023, 7:00:09 PM2/26/23
to tmlr-anno...@googlegroups.com

New submissions
===============


Title: Robust Self-Supervised Learning with Lie Groups

Abstract: Deep learning has led to remarkable advances in computer vision. Even so, today's best models are brittle when presented with variations that differ even slightly from those seen during training. Minor shifts in the pose, color, or illumination of an object can lead to catastrophic misclassifications. State-of-the art models struggle to understand how a set of variations can affect different objects. We propose a framework for instilling a notion of how objects vary in more realistic settings. Our approach applies the formalism of Lie groups to capture continuous transformations to improve models' robustness to distributional shifts. We apply our framework on top of state-of-the-art self-supervised learning (SSL) models, finding that explicitly modeling transformations with Lie groups leads to substantial performance gains of greater than 10% for MAE on both known instances seen in typical poses now presented in new poses, and on unknown instances in any pose. We also apply our approach to ImageNet, finding that the Lie operator improves performance by almost 4%. These results demonstrate the promise of learning transformations to improve model robustness.

URL: https://openreview.net/forum?id=2JhYMWwYj9

---

Title: The Vendi Score: A Diversity Evaluation Metric for Machine Learning

Abstract: Diversity is an important criterion for many areas of machine learning (ML), including generative modeling and dataset curation. Yet little work has gone into understanding, formalizing, and measuring diversity in ML. In this paper we address the diversity evaluation problem by proposing the Vendi Score, which connects and extends ideas from ecology and quantum statistical mechanics to ML. The Vendi Score is defined as the exponential of the Shannon entropy of the eigenvalues of a similarity matrix. This matrix is induced by a user-defined similarity function applied to the sample to be evaluated for diversity. In taking a similarity function as input, the Vendi Score enables its user to specify any desired form of diversity. Importantly, unlike many existing metrics in ML, the Vendi Score doesn’t require a reference dataset or distribution over samples or labels, it is therefore general and applicable to any generative model, decoding algorithm, and dataset from any domain where similarity can be defined. We showcase the Vendi Score on molecular generative modeling where we found it addresses shortcomings of the current diversity metric of choice in that domain. We also applied the Vendi Score to generative models of images and decoding algorithms of text where we found it confirms known results about diversity in those domains. Furthermore, we used the Vendi Score to measure mode collapse, a known shortcoming of generative adversarial networks (GANs). In particular, the Vendi Score revealed that even GANs that capture all the modes of a labelled dataset can be less diverse than the original dataset. Finally, the interpretability of the Vendi Score allowed us to diagnose several benchmark ML datasets for diversity, opening the door for diversity-informed data augmentation.

URL: https://openreview.net/forum?id=g97OHbQyk1

---

Title: Stochastic Constrained DRO with a Complexity Independent of Sample Size

Abstract: Distributionally Robust Optimization (DRO), as a popular method to train robust models against distribution shift between training and test sets, has received tremendous attention in recent years. In this paper, we propose and analyze stochastic algorithms that apply to both non-convex and convex losses for solving Kullback–Leibler divergence constrained DRO problem. Compared with existing methods solving this problem, our stochastic algorithms not only enjoy competitive if not better complexity independent of sample size but also just require a constant batch size at every iteration, which is more practical for broad applications. We establish a nearly optimal complexity bound for finding an $\epsilon$-stationary solution for non-convex losses and an optimal complexity for finding an $\epsilon$-optimal solution for convex losses. Empirical studies demonstrate the effectiveness of the proposed algorithms for solving non-convex and convex constrained DRO problems.

URL: https://openreview.net/forum?id=VpaXrBFYZ9

---

Title: Know Your Self-supervised Learning: A Survey on Image-based Discriminative Training

Abstract: Although supervised learning has been highly successful in improving the state-of-the-art in the domain of image-based computer vision in the past, the margin of improvement has diminished significantly in recent years, indicating that a plateau is in sight. Meanwhile, the use of self-supervised learning (SSL) for the purpose of natural language processing (NLP) has seen tremendous successes during the past couple of years, with this new learning paradigm yielding powerful language models. Inspired by the excellent results obtained in the field of NLP, self-supervised methods that rely on clustering, contrastive learning, distillation, and information-maximization, which all fall under the banner of discriminative SSL, have experienced a swift uptake in the area of computer vision. Consequently, within a span of three years, more than 50 unique general-purpose frameworks for discriminative SSL, with a focus on images, were proposed. In this survey, we review a plethora of research efforts conducted on image-oriented SSL, paying attention to best practices and useful software packages. While doing so, we discuss pretext tasks for image-based SSL, as well as techniques that are commonly used in discriminative SSL. Lastly, to aid researchers who aim at contributing to image-focused SSL, we outline a number of relevant research directions.

URL: https://openreview.net/forum?id=Ma25S4ludQ

---

Title: Amortized Learning of Flexible Feature Scaling for Image Segmentation

Abstract: Convolutional neural networks (CNN) have become the predominant model for image segmentation tasks. Most CNN segmentation architectures resize spatial dimensions by a fixed factor of two to aggregate spatial context. Recent work has explored using other resizing factors to improve model accuracy for specific applications. However, finding the appropriate rescaling factor most often involves training a separate network for many different factors and comparing the performance of each model. The computational burden of these models means that in practice it is rarely done, and when done only a few different scaling factors are considered.

In this work, we present a hypernetwork strategy that can be used to easily and rapidly generate the Pareto frontier for the trade-off between accuracy and efficiency as the rescaling factor varies. We show how to train a single hypernetwork that generates CNN parameters conditioned on a rescaling factor. This enables a user to quickly choose a rescaling factor that appropriately balances accuracy and computational efficiency for their particular needs. We focus on image segmentation tasks, and demonstrate the value of this approach across various domains. We also find that, for a given rescaling factor, our single hypernetwork outperforms CNNs trained with fixed rescaling factors.


URL: https://openreview.net/forum?id=nPwdnRNJEP

---

Title: Visualizing the diversity of representations learned by Bayesian neural networks

Abstract: Explainable Artificial Intelligence (XAI) aims to make learning machines less opaque, and
offers researchers and practitioners various tools to reveal the decision-making strategies of
neural networks. In this work, we investigate how XAI methods can be used for exploring
and visualizing the diversity of feature representations learned by Bayesian neural networks
(BNNs). Our goal is to provide a global understanding of BNNs by making their decision-
making strategies a) visible and tangible through feature visualizations and b) quantitatively
measurable with a distance measure learned by contrastive learning. Our work provides new
insights into the posterior distribution in terms of human-understandable feature information
with regard to the underlying decision-making strategies. Our main findings are the following:
1) global XAI methods can be applied to explain the diversity of decision-making strategies of
BNN instances, 2) Monte Carlo dropout with commonly used Dropout rates exhibit increased
diversity in feature representations compared to the multimodal posterior approximation
of MultiSWAG, 3) the diversity of learned feature representations highly correlates with
the uncertainty estimate for the output and 4) the inter-mode diversity of the multimodal
posterior decreases as the network width increases, while the intra-mode diversity increases.
Our findings are consistent with the recent deep neural networks theory, providing additional
intuitions about what the theory implies in terms of humanly understandable concepts.

URL: https://openreview.net/forum?id=ZSxvyWrX6k

---

Title: A Group Variable Importance Framework for Bayesian Neural Networks

Abstract: While the success of neural networks has been well-established across a variety of domains,
our ability to interpret these methods is still limited. Traditional variable importance
approaches in machine learning overcome this issue by providing local explanations about
particular predictive decisions - that is, they detail how important any given feature is to the
classification of a particular sample in the dataset. However, univariate mapping approaches
have been shown across many applications in the literature to generate false positives and
negatives in high-dimensional and collinear data settings. In this paper, we focus on the
slightly different task of global interpretability where our goal is to identify important groups
of variables by aggregating over collections of univariate signals to improve power and
mitigate false discovery. In the context of neural networks, a feature is rarely important on
its own, so our strategy is specifically designed to leverage partial covariance structures and
incorporate variable interactions into our proposed group feature ranking. Here, we extend
the recently proposed “RelATive cEntrality” (RATE) measure to the Bayesian deep learning
setting. We refer to this approach as the “GroupRATE” criterion. Given a trained network,
GroupRATE applies an information theoretic metric to the joint posterior distribution of
effect sizes to assess group-level significance of features. Importantly, unlike competing
approaches, our method does not require tuning parameters which can be costly and difficult
to select. We demonstrate the utility of our framework on both simulated and real data.

URL: https://openreview.net/forum?id=IJeOHgLIax

---

Reply all
Reply to author
Forward
0 new messages