Weekly TMLR digest for Dec 04, 2022

0 views

Skip to first unread message

TMLR

unread,

Dec 3, 2022, 7:00:11 PM12/3/22

to tmlr-annou...@googlegroups.com

Accepted papers
===============

Title: Interpretable Node Representation with Attribute Decoding

Authors: Xiaohui Chen, Xi Chen, Liping Liu

Abstract: Variational Graph Autoencoders (VGAEs) are powerful models for unsupervised learning of node representations from graph data. In this work, we make a systematic analysis of modeling node attributes in VGAEs and show that attribute decoding is important for node representation learning. We further propose a new learning model, interpretable NOde Representation with Attribute Decoding (NORAD). The model encodes node representations in an interpretable approach: node representations capture community structures in the graph and the relationship between communities and node attributes. We further propose a rectifying procedure to refine node representations of isolated notes, which improves the quality of the representations of these nodes. Our empirical results demonstrate the advantage of the proposed model when learning graph data in an interpretable approach.

URL: https://openreview.net/forum?id=AZIfC91hjM

---

Title: A Unified Domain Adaptation Framework with Distinctive Divergence Analysis

Authors: Zhiri YUAN, Xixu HU, Qi WU, Shumin MA, Cheuk Hang LEUNG, Xin Shen, Yiyan HUANG

Abstract: Unsupervised domain adaptation enables knowledge transfer from a labeled source domain to an unlabeled target domain by aligning the learnt features of both domains. The idea is theoretically supported by the generalization bound analysis in Ben-David et al. (2007), which specifies the applicable task (binary classification) and designates a specific distribution divergence measure. Although most distribution-aligning domain adaptation models seek theoretical grounds from this particular bound analysis, they do not actually fit into the stringent conditions. In this paper, we bridge the long-standing theoretical gap in literature by providing a unified generalization bound. Our analysis can well accommodate the classification/regression tasks and most commonly-used divergence measures, and more importantly, it can theoretically recover a large amount of previous models. In addition, we identify the key difference in the distribution divergence measures underlying the diverse models and commit a comprehensive in-depth comparison of the commonly-used divergence measures. Based on the unified generalization bound, we propose new domain adaptation models that achieve transferability through domain-invariant representations and conduct experiments on real-world datasets that corroborate our theoretical findings. We believe these insights are helpful in guiding the future design of distribution-aligning domain adaptation algorithms.

URL: https://openreview.net/forum?id=yeT9cBq8Cn

---

Title: Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

Authors: Alberto Bordino, Stefano Favaro, Sandra Fortini

Abstract: There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting.

URL: https://openreview.net/forum?id=A5tIluhDW6

---

Title: Counterfactual Learning with Multioutput Deep Kernels

Authors: Alberto Caron, Ioanna Manolopoulou, Gianluca Baio

Abstract: In this paper, we address the challenge of performing counterfactual inference with observational data via Bayesian nonparametric regression adjustment, with a focus on high-dimensional settings featuring multiple actions and multiple correlated outcomes. We present a general class of counterfactual multi-task deep kernels models that estimate causal effects and learn policies proficiently thanks to their sample efficiency gains, while scaling well with high dimensions. In the first part of the work, we rely on Structural Causal Models (SCM) to formally introduce the setup and the problem of identifying counterfactual quantities under observed confounding. We then discuss the benefits of tackling the task of causal effects estimation via stacked coregionalized Gaussian Processes and Deep Kernels. Finally, we demonstrate the use of the proposed methods on simulated experiments that span individual causal effects estimation, off-policy evaluation and optimization.

URL: https://openreview.net/forum?id=iGREAJdULX

---

Title: Incorporating Sum Constraints into Multitask Gaussian Processes

Authors: Philipp Pilar, Carl Jidling, Thomas B. Schön, Niklas Wahlström

Abstract: Machine learning models can be improved by adapting them to respect existing background knowledge. In this paper we consider multitask Gaussian processes, with background knowledge in the form of constraints that require a specific sum of the outputs to be constant. This is achieved by conditioning the prior distribution on the constraint fulfillment. The approach allows for both linear and nonlinear constraints. We demonstrate that the constraints are fulfilled with high precision and that the construction can improve the overall prediction accuracy as compared to the standard Gaussian process.

URL: https://openreview.net/forum?id=gzu4ZbBY7S

---

Title: Degradation Attacks on Certifiably Robust Neural Networks

Authors: Klas Leino, Chi Zhang, Ravi Mangal, Matt Fredrikson, Bryan Parno, Corina Pasareanu

Abstract: Certifiably robust neural networks protect against adversarial examples by employing run-time defenses that check if the model is certifiably locally robust at the input under evaluation. We show through examples and experiments that any defense (whether complete or incomplete) based on checking local robustness is inherently over-cautious. Specifically, such defenses flag inputs for which local robustness checks fail, but yet that are not adversarial; i.e., they are classified consistently with all valid inputs within a distance of $\epsilon$. As a result, while a norm-bounded adversary cannot change the classification of an input, it can use norm-bounded changes to degrade the utility of certifiably robust networks by forcing them to reject otherwise correctly classifiable inputs. We empirically demonstrate the efficacy of such attacks against state-of-the-art certifiable defenses. Our code is available at https://github.com/ravimangal/degradation-attacks.

URL: https://openreview.net/forum?id=P0XO5ZE98j

---

Title: If your data distribution shifts, use self-learning

Authors: Evgenia Rusak, Steffen Schneider, George Pachitariu, Luisa Eck, Peter Vincent Gehler, Oliver Bringmann, Wieland Brendel, Matthias Bethge

Abstract: We demonstrate that self-learning techniques like entropy minimization and pseudo-labeling are simple and effective at improving performance of a deployed computer vision model under systematic domain shifts. We conduct a wide range of large-scale experiments and show consistent improvements irrespective of the model architecture, the pre-training technique or the type of distribution shift. At the same time, self-learning is simple to use in practice because it does not require knowledge or access to the original training data or scheme, is robust to hyperparameter choices, is straight-forward to implement and requires only a few adaptation epochs. This makes self-learning techniques highly attractive for any practitioner who applies machine learning algorithms in the real world. We present state-of-the-art adaptation results on CIFAR10-C (8.5% error), ImageNet-C (22.0% mCE), ImageNet-R (17.4% error) and ImageNet-A (14.8% error), theoretically study the dynamics of self-supervised adaptation methods and propose a new classification dataset (ImageNet-D) which is challenging even with adaptation.

URL: https://openreview.net/forum?id=vqRzLv6POg

---

Title: An approximate sampler for energy-based models with divergence diagnostics

Authors: Bryan Eikema, Germán Kruszewski, Christopher R Dance, Hady Elsahar, Marc Dymetman

Abstract: Energy-based models (EBMs) allow flexible specifications of probability distributions. However, sampling from EBMs is non-trivial, usually requiring approximate techniques such as Markov chain Monte Carlo (MCMC). A major downside of MCMC sampling is that it is often impossible to compute the divergence of the sampling distribution from the target distribution: therefore, the quality of the samples cannot be guaranteed. Here, we introduce quasi-rejection sampling (QRS), a simple extension of rejection sampling that performs approximate sampling, but, crucially, does provide divergence diagnostics (in terms of f-divergences, such as KL divergence and total variation distance). We apply QRS to sampling from discrete EBMs over text for controlled generation. We show that we can sample from such EBMs with arbitrary precision in exchange for sampling efficiency and quantify the trade-off between the two by means of the aforementioned diagnostics.

URL: https://openreview.net/forum?id=VW4IrC0n0M

---

Title: A Unified Survey on Anomaly, Novelty, Open-Set, and Out of-Distribution Detection: Solutions and Future Challenges

Authors: Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, Mohammad Sabokrou

Abstract: Machine learning models often encounter samples that are diverged from the training distribution. Failure to recognize an out-of-distribution (OOD) sample, and consequently assign that sample to an in-class label, significantly compromises the reliability of a model. The problem has gained significant attention due to its importance for safety deploying models in open-world settings. Detecting OOD samples is challenging due to the intractability of modeling all possible unknown distributions. To date, several research domains tackle
the problem of detecting unfamiliar samples, including anomaly detection, novelty detection, one-class learning, open set recognition, and out-of-distribution detection. Despite having similar and shared concepts, out-of-distribution, open-set, and anomaly detection
have been investigated independently. Accordingly, these research avenues have not crosspollinated, creating research barriers. While some surveys intend to provide an overview of these approaches, they seem to only focus on a specific domain without examining the
relationship between different domains. This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities. Researchers can benefit from the overview of research advances in different fields and develop future methodology synergistically. Furthermore, to the best of our knowledge, while there are surveys in anomaly detection or one-class learning, there is no comprehensive or up-to-date survey on out-of-distribution detection, which this survey covers extensively. Finally, having a unified cross-domain perspective, this study discusses and sheds light on future lines of research, intending to bring these fields closer together.

URL: https://openreview.net/forum?id=aRtjVZvbpK

---

Title: Bayesian Methods for Constraint Inference in Reinforcement Learning

Authors: Dimitris Papadimitriou, Usman Anwar, Daniel S. Brown

Abstract: Learning constraints from demonstrations provides a natural and efficient way to improve the safety of AI systems; however, prior work only considers learning a single, point-estimate of the constraints. By contrast, we consider the problem of inferring constraints from demonstrations using a Bayesian perspective. We propose Bayesian Inverse Constraint Reinforcement Learning (BICRL), a novel approach that infers a posterior probability distribution over constraints from demonstrated trajectories. The main advantages of BICRL, compared to prior constraint inference algorithms, are (1) the freedom to infer constraints from partial trajectories and even from disjoint state-action pairs, (2) the ability to infer constraints from suboptimal demonstrations and in stochastic environments, and (3) the opportunity to use the posterior distribution over constraints in order to implement active learning and robust policy optimization techniques. We show that BICRL outperforms pre-existing constraint learning approaches, leading to more accurate constraint inference and consequently safer policies. We further propose Hierarchical BICRL that infers constraints locally in sub-spaces of the entire domain and then composes global constraint estimates leading to accurate and computationally efficient constraint estimation.

URL: https://openreview.net/forum?id=oRjk5V9eDp

---

Title: A Crisis In Simulation-Based Inference? Beware, Your Posterior Approximations Can Be Unfaithful

Authors: Joeri Hermans, Arnaud Delaunoy, François Rozet, Antoine Wehenkel, Volodimir Begy, Gilles Louppe

Abstract: We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms can produce computationally unfaithful posterior approximations. Our results show that all benchmarked algorithms -- (S)NPE, (S)NRE, SNL and variants of ABC -- can yield overconfident posterior approximations, which makes them unreliable for scientific use cases and falsificationist inquiry. Failing to address this issue may reduce the range of applicability of simulation-based inference. For this reason, we argue that research efforts should be made towards theoretical and methodological developments of conservative approximate inference algorithms and present research directions towards this objective. In this regard, we show empirical evidence that ensembling posterior surrogates provides more reliable approximations and mitigates the issue.

URL: https://openreview.net/forum?id=LHAbHkt6Aq

---

Title: On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning

Authors: Lu Han, Han-Jia Ye, De-Chuan Zhan

Abstract: When there are unlabeled Out-Of-Distribution (OOD) data from other classes, Semi-Supervised Learning (SSL) methods suffer from severe performance degradation and even get worse than merely training on labeled data. In this paper, we empirically analyze Pseudo-Labeling (PL) in class-mismatched SSL. PL is a simple and representative SSL method that transforms SSL problems into supervised learning by creating pseudo-labels for unlabeled data according to the model's prediction. We aim to answer two main questions: (1) How do OOD data influence PL? (2) What is the proper usage of OOD data with PL? First, we show that the major problem of PL is imbalanced pseudo-labels on OOD data. Second, we find that OOD data can help classify In-Distribution (ID) data given their OOD ground truth labels. Based on the findings, we propose to improve PL in class-mismatched SSL with two components -- Re-balanced Pseudo-Labeling (RPL) and Semantic Exploration Clustering (SEC). RPL re-balances pseudo-labels of high-confidence data, which simultaneously filters out OOD data and addresses the imbalance problem. SEC uses balanced clustering on low-confidence data to create pseudo-labels on extra classes, simulating the process of training with ground truth. Experiments show that our method achieves steady improvement over supervised baseline and state-of-the-art performance under all class mismatch ratios on different benchmarks.

URL: https://openreview.net/forum?id=tLG26QxoD8

---

New submissions
===============

Title: Learning to correct spectral methods for simulating turbulent flows

Abstract: Despite their ubiquity throughout science and engineering, only a handful of partial differential equations (PDEs) have analytical, or closed-form solutions. This motivates a vast amount of classical work on numerical simulation of PDEs and more recently, a whirlwind of research into data-driven techniques leveraging machine learning (ML). A recent line of work indicates that a hybrid of classical numerical techniques and machine learning can offer significant improvements over either approach alone. In this work, we show that the choice of the numerical scheme is crucial when incorporating physics-based priors. We build upon Fourier-based spectral methods, which are known to be more efficient than other numerical schemes for simulating PDEs with smooth and periodic solutions. Specifically, we develop ML-augmented spectral solvers for three common PDEs of fluid dynamics. Our models are more accurate (2-4x) than standard spectral solvers at the same resolution but have longer overall runtimes (~2x), due to the additional runtime cost of the neural network component. We also demonstrate a handful of key design principles for combining machine learning and numerical methods for solving PDEs.

URL: https://openreview.net/forum?id=wNBARGxoJn

---

Title: Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer

Abstract: When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.

URL: https://openreview.net/forum?id=kJcwlP7BRs

---

Title: Solving Nonconvex-Nonconcave Min-Max Problems exhibiting Weak Minty Solutions

Abstract: We investigate a structured class of nonconvex-nonconcave min-max problems exhibiting so-called \emph{weak Minty} solutions, a notion which was only recently introduced, but is able to simultaneously capture different generalizations of monotonicity. We prove novel convergence results for a generalized version of the optimistic gradient method (OGDA) in this setting, matching the $1/k$ rate for the best iterate in terms of the squared operator norm recently shown for the extragradient method (EG). In addition we propose an adaptive step size version of EG, which does not require knowledge of the problem parameters.

URL: https://openreview.net/forum?id=Gp0pHyUyrb

---

Title: Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

Abstract: Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results.

URL: https://openreview.net/forum?id=tEVpz2xJWX

---

Title: Quantifying probabilistic robustness of tree-based classifiers against natural distortions

Abstract: The concept of trustworthy AI has gained widespread attention lately. One of the aspects relevant to trustworthy AI is robustness of ML models. In this study, we show how to probabilistically quantify robustness against naturally occurring distortions of input data for tree-based classifiers under the assumption that the natural distortions can be described by multivariate probability distributions that can be transformed to multivariate normal distributions. The idea is to extract the decision rules of a trained tree-based classifier, separate the feature space into non-overlapping regions and determine the probability that a data sample with distortion returns its predicted label. The approach is based on the recently introduced measure of real-world-robustness, which works for all black box classifiers, but is only an approximation and only works if the input dimension is not too high, whereas our proposed method gives an exact measure.

URL: https://openreview.net/forum?id=99J6lWgXCl

---

Title: Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

Abstract: In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.

URL: https://openreview.net/forum?id=TNocbXm5MZ

---

Title: Tackling Visual Control via Multi-View Exploration Maximization

Abstract: We present MEM: Multi-view Exploration Maximization for tackling complex visual control tasks. To the best of our knowledge, MEM is the first approach that combines multi-view representation learning and intrinsic reward-driven exploration in reinforcement learning (RL). More specifically, MEM first extracts the specific and shared information of multi-view observations to form high-quality features before performing RL on the learned features, enabling the agent to fully comprehend the environment and yield better actions. Furthermore, MEM transforms the multi-view features into intrinsic rewards based on entropy maximization to encourage exploration. As a result, MEM can significantly promote the sample-efficiency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space. We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games. Extensive simulation results demonstrate that MEM can achieve superior performance and outperform the benchmarking schemes with simple architecture and higher efficiency.

URL: https://openreview.net/forum?id=zvuhsIAkUl

---

Title: MASIF: Meta-learned Algorithm Selection using Implicit Fidelity Information

Abstract: Selecting a well-performing algorithm for a given task or dataset can be time-consuming and
tedious, but is crucial for the successful day-to-day business of developing new AI & ML
applications. Algorithm Selection (AS) mitigates this through a meta-model leveraging
meta-information about previous tasks. However, most of the available AS methods are
error-prone because they characterize a task by either cheap-to-compute properties of the
dataset or evaluations of cheap proxy algorithms, called landmarks. In this work, we extend
the classical AS data setup to include multi-fidelity information and empirically demonstrate
how meta-learning on algorithms’ learning behaviour allows us to exploit cheap test-time
evidence effectively and combat myopia significantly. We further postulate a budget-regret
trade-off w.r.t. the selection process. Our new selector MASIF is able to jointly interpret
online evidence on a task in form of varying-length learning curves without any parametric
assumption by leveraging a transformer-based encoder. This opens up new possibilities for
guided rapid prototyping in data science on cheaply observed partial learning curves.

URL: https://openreview.net/forum?id=5aYGXxByI6

---

Title: Differentially Private Image Classification from Features

Abstract: In deep learning, leveraging transfer learning has recently been shown to be an effective strategy for training large high performance models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on using first-order differentially private training algorithms like DP-SGD for training large models, in the specific case of privately learning from features, we observe that computational burden is often low enough to allow for more sophisticated optimization schemes, including second-order methods. To that end, we systematically explore the effect of design parameters such as loss function and optimization algorithm. We find that, while commonly used logistic regression performs better than linear regression in the non-private setting, the situation is reversed in the private setting. We find that least-squares linear regression is much more effective than logistic regression from both privacy and computational standpoint, especially at stricter epsilon values ($\epsilon < 1$). On the optimization side, we also explore using Newton's method, and find that second-order information is quite helpful even with privacy, although the benefit significantly diminishes with stricter privacy guarantees. While both methods use second-order information, least squares is more effective at lower epsilon values while Newton's method is more effective at larger epsilon values. To combine the benefits of both methods, we propose a novel optimization algorithm called DP-FC, which leverages feature covariance instead of the Hessian of the logistic regression loss and performs well across all $\epsilon$ values we tried. With this, we obtain new SOTA results on ImageNet-1k, CIFAR-100 and CIFAR-10 across all values of $\epsilon$ typically considered. Most remarkably, on ImageNet-1K, we obtain top-1 accuracy of 88\% under DP guarantee of (8, $8 * 10^{-7}$) and 84.3\% under (0.1, $8 * 10^{-7}$).

URL: https://openreview.net/forum?id=Cj6pLclmwT

---

Title: Constrained Parameter Inference as a Principle for Learning

Abstract: Learning in neural networks is often framed as a problem in which targeted error signals are directly propagated to parameters and used to produce updates that induce more optimal network behaviour. Backpropagation of error (BP) is an example of such an approach and has proven to be a highly successful application of stochastic gradient descent to deep neural networks. We propose constrained parameter inference (COPI) as a new principle for learning. The COPI approach assumes that learning can be set up in a manner where parameters infer their own values based upon observations of their local neuron activities. We find that this estimation of network parameters is possible under the constraints of decorrelated neural inputs and top-down perturbations of neural states for credit assignment.
We show that the decorrelation required for COPI allows learning at extremely high learning rates, competitive with that of adaptive optimizers, as used by BP. We further demonstrate that COPI affords a new approach to feature analysis and network compression. Finally, we argue that COPI may shed new light on learning in biological networks given the evidence for decorrelation in the brain.

URL: https://openreview.net/forum?id=CUDdbTT1QC

---

Title: Threat Model-Agnostic Adversarial Defense using Diffusion Models

Abstract: Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) -- training the classification network on images perturbed according to a specific threat model, which defines the magnitude of the allowed modification. Although AT leads to promising results, training on a specific threat model fails to generalize to other types of perturbations. A different approach utilizes a preprocessing step to remove the adversarial perturbation from the attacked image. In this work, we follow the latter path and aim to develop a technique that leads to robust classifiers across various realizations of threat models. To this end, we harness the recent advances in stochastic generative modeling, and means to leverage these for sampling from conditional distributions. Our defense relies on an addition of Gaussian i.i.d noise to the attacked image, followed by a pretrained diffusion process -- an architecture that performs a stochastic iterative process over a denoising network, yielding a high perceptual quality denoised outcome. The obtained robustness with this stochastic preprocessing step is validated through extensive experiments on the CIFAR-10 and CIFAR-10-C datasets, showing that our method outperforms the leading defense methods under various threat models.

URL: https://openreview.net/forum?id=Xinam90jSO

---

Title: Sampling Matters in Explanations: Towards Robust Attribution Analysis with Feature Suppression

Abstract: Pixel-wise image attribution analysis seeks to highlight a subset of the semantic features from inputs and such a subset can reflect the interactions between the features and their inferences. The gradient maps of decision risk values with respect to the inputs can highlight a fraction of the interactive features relevant to inferences. Gradient integration is a pixel-wise attribution approach by sampling multiple samples from the given inputs and then summing the derived gradient maps from the samples as explanations. Our theoretical analysis demonstrate that the alignment of the sampling distribution can delimit the upper bound of explanation certainty. Prior works leverage some normal or uniform distribution for sampling and the misalignment of their distributions can thus lead to low explanation certainty. Furthermore, their explanations can fail if models are trained with data augmentation due to the skewed distribution. We present a semi-ideal sampling approach to improve the explanation certainty by simply suppressing features. Such an approach can align with the natural image feature distribution and preserve intuition-aligned features without adding agnostic information. Further theoretical analysis from the perspective of cooperative game theory also shows that our approach is in fact equivalent to an estimation of Shapley values. The extensive quantitative evaluation on ImageNet can further affirm that our approach is able to yield more satisfactory explanations by preserving more information against state-of-the-art baselines.

URL: https://openreview.net/forum?id=oXMOcSS0CZ

---

Title: The Optimal GAN Discriminators are High-dimensional Interpolators

Abstract: We consider the problem of optimizing the discriminator in generative adversarial networks (GANs) subject to higher-order gradient regularization. We show analytically, via the least-squares (LSGAN) and Wasserstein (WGAN) GAN variants, that the discriminator optimization problem is one of high-dimensional interpolation. The optimal discriminator, derived using variational calculus, turns out to be the solution to a partial differential equation involving the iterated Laplacian or the polyharmonic operator. The solution is implementable in closed-form via polyharmonic radial basis function (RBF) interpolation. In view of the polyharmonic connection, we refer to the corresponding GANs as Poly-LSGAN and Poly-WGAN. As a proof of concept, the analysis is supported by experimental validation on multivariate Gaussians. While the closed-form RBF does not scale favorably with the dimensionality of data for image-space generation, we employ the Poly-WGAN discriminator to transform the latent space distribution of the data to match a Gaussian in a Wasserstein autoencoder (WAE). The closed-form discriminator, motivated by the polyharmonic RBF, results in up to 20\% improvement in terms of Fr{\'e}chet and kernel inception distances over comparable baselines that employ trainable or kernel-based discriminators. The experiments are carried out on standard image datasets such as MNIST, CIFAR-10, CelebA, and LSUN-Churches. The training time in Poly-WGAN is comparable to those of kernel-based methods, while being about two orders faster than GANs with a trainable discriminator.

URL: https://openreview.net/forum?id=Rjra576kZN

---

Title: BIGRoC: Boosting Image Generation via a Robust Classifier

Abstract: The interest of the machine learning community in image synthesis has grown significantly in recent years, with the introduction of a wide range of deep generative models and means for training them. In this work, we propose a general model-agnostic technique for improving the image quality and the distribution fidelity of generated images obtained by any generative model. Our method, termed BIGRoC (Boosting Image Generation via a Robust Classifier), is based on a post-processing procedure via the guidance of a given robust classifier and without a need for additional training of the generative model. Given a synthesized image, we propose to update it through projected gradient steps over the robust classifier to refine its recognition. We demonstrate this post-processing algorithm on various image synthesis methods and show a significant quantitative and qualitative improvement on CIFAR-10 and ImageNet. Surprisingly, although BIGRoC is the first model agnostic among refinement approaches and requires much less information, it outperforms competitive methods. Specifically, BIGRoC improves the image synthesis best performing diffusion model on ImageNet $128\times128$ by 14.81%, attaining an FID score of 2.53 and on $256\times256$ by 7.87%, achieving an FID of 3.63. Moreover, we conduct an opinion survey, according to which humans significantly prefer our method's outputs.

URL: https://openreview.net/forum?id=y7RGNXhGSR

---

Title: Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Abstract: We propose a unifying view to analyze the representation quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.
We argue that representations can be evaluated through the lens of **expressiveness** and **learnability**. We propose to use the Intrinsic Dimension (ID) to assess expressiveness and introduce Cluster Learnability (CL) to assess learnability. CL is measured as the learning speed of a KNN classifier trained to predict labels obtained by clustering the representations with $K$-means. We thus combine CL and ID into a single predictor -- CLID. Through a large-scale empirical study with a diverse family of SSL algorithms, we find that CLID better correlates with in-distribution model performance than other competing recent evaluation schemes. We also benchmark CLID on out-of-domain generalization, where CLID serves as a predictor of the transfer performance of SSL models on several classification tasks, yielding improvements with respect to the competing baselines.

URL: https://openreview.net/forum?id=pWqrWG6zw2

---

Title: Action Poisoning Attacks on Linear Contextual Bandits

Abstract: Contextual bandit algorithms have many applicants in a variety of scenarios. In order to develop trustworthy contextual bandit systems, understanding the impacts of various adversarial attacks on contextual bandit algorithms is essential. In this paper, we propose a new class of attacks: action poisoning attacks, where an adversary can change the action signal selected by the agent. We design action poisoning attack schemes against linear contextual bandit algorithms in both white-box and black-box settings. We further analyze the cost of the proposed attack strategies for a very popular and widely used bandit algorithm: Lin UCB. We show that, in both white-box and black-box settings, the proposed attack schemes can force the LinUCB agent to pull a target arm very frequently by spending only logarithm cost. We also extend the proposed attack strategies to generalized linear models and show the effectiveness of the proposed strategies.

URL: https://openreview.net/forum?id=yhGCKUsKJS

---

Title: Not All Tasks are Equal - Task Attended Meta-learning for Few-shot Learning

Abstract: Meta-learning (ML) has emerged as a promising direction in learning models under constrained resource settings like few-shot learning. The popular approaches for ML either learn a generalizable initial model or a generic parametric optimizer through batch episodic training. In this work, we study the importance of tasks in a batch for ML. We hypothesize that the common assumption in batch episodic training where each task in a batch has an equal contribution to learning an optimal meta-model need not be true. We propose to weight the tasks in a batch according to their "importance'' in improving the meta-model's learning. To this end, we introduce a training curriculum called task attended meta-training to learn a meta-model from weighted tasks in a batch. The task attention module is a standalone unit and can be integrated with any batch episodic training regimen. Comparison of task-attended ML models with their non-task-attended counterparts on complex datasets, performance improvement of proposed curriculum over state-of-the-art task scheduling algorithms on noisy datasets, and cross-domain few shot learning setup validate its effectiveness.

URL: https://openreview.net/forum?id=aZsOX6k7Uv

---

Title: Assisted Learning for Organizations with Limited Imbalanced Data

Abstract: In the era of big data, many big organizations are integrating machine learning into their work pipelines to facilitate data analysis. However, the performance of their trained models is often restricted by limited and imbalanced data available to them. In this work, we develop an assisted learning framework for assisting organizations to improve their learning performance. The organizations have sufficient computation resources but are subject to stringent data-sharing and collaboration policies. Their limited imbalanced data often cause biased inference and sub-optimal decision-making. In assisted learning, an organizational learner purchases assistance service from an external service provider and aims to enhance its model performance within only a few assistance rounds. We develop effective stochastic training algorithms for both assisted deep learning and assisted reinforcement learning. Different from existing distributed algorithms that need to frequently transmit gradients or models, our framework allows the learner to only occasionally share information with the service provider, but still obtain a model that achieves near-oracle performance as if all the data were centralized.

URL: https://openreview.net/forum?id=SEDWlhcFWA

---

Title: A Bi-level Framework for Debiasing Implicit Feedback with Low Variance

Abstract: Implicit feedback is easy to collect and contains rich weak supervision signals, thus is broadly used in recommender systems. Recent works reveal a huge gap between the implicit feedback and the user-item relevance due to the fact that users tend to access items with high exposures but these items may not be necessarily relevant to users' preferences. To bridge the gap, existing methods explicitly model the item exposure degree and propose unbiased estimators to improve the relevance. Unfortunately, these unbiased estimators suffer from the high gradient variance, especially for long-tail items, leading to inaccurate gradient updates and degraded model performance.

To tackle this challenge, we propose a bi-level framework for debiasing implicit feedback with low variance. We first develop a low-variance unbiased estimator from a probabilistic perspective, which effectively bounds the variance of the gradient. Unlike previous works which either estimate the exposure via heuristic-based strategies or use a large biased training set, we propose to estimate the exposure via an unbiased small-scale validation set. Specifically, we parameterize the user-item exposure by incorporating both user and item information, and propose to construct the unbiased validation set only from the biased training set instead of using random policy at the cost of degrading user experience. By leveraging the unbiased validation set, we adopt a bi-level optimization framework to automatically update exposure-related parameters along with recommendation model parameters during the learning. Experiments on two real-world datasets and two semi-synthetic datasets verify the effectiveness of our method. Our code is available at \url{https://anonymous.4open.science/r/TMLR-Biff/README.md}.

URL: https://openreview.net/forum?id=GQXLTtEm0O

---

Title: Sobolev Spaces, Kernels and Discrepancies over Hyperspheres

Abstract: This work extends analytical foundations for kernel methods beyond the usual Euclidean manifold. Specifically, we characterise the smoothness of the native spaces (reproducing kernel Hilbert spaces) that are reproduced by geodesically isotropic kernels in the hyperspherical context. Our results have direct consequences for kernel cubature, determining the rate of convergence of the worst case error, and expanding the applicability of cubature algorithms based on Stein's method. First, we introduce a characterisation of Sobolev spaces on the $d$-dimensional sphere based on the Fourier--Schoenberg sequences associated with a given kernel. Such sequences are hard (if not impossible) to compute analytically on $d$-dimensional spheres, but often feasible over Hilbert spheres, where $d = \infty$. Second, we circumvent this problem by finding a projection operator that allows us to map from Hilbert spheres to finite-dimensional spheres. Our findings are illustrated for selected parametric families of kernel.

URL: https://openreview.net/forum?id=82hRiAbnnm

---

Title: Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity

Abstract: In lifelong learning, data are used to improve performance not only on the current task, but also on previously encountered, and as yet unencountered tasks. In contrast, classical machine learning, which we define as, starts from a blank slate, or tabula rasa and uses data only for the single task at hand. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain performance on old tasks given new tasks. But striving to avoid forgetting sets the goal unnecessarily low. The goal of lifelong learning should be not only to improve performance on future tasks (forward transfer) but also on past tasks (backward transfer) with any new data. Our key insight is that we can synergistically ensemble representations---that were learned independently on disparate tasks---to enable both forward and backward transfer. This generalizes ensembling decisions (like in decision forests) and complements ensembling dependently learned representations (like in multitask learning). Moreover, we can ensemble representations in quasilinear space and time. We demonstrate this insight with two algorithms: representation ensembles of (1) trees and (2) networks. Both algorithms demonstrate forward and backward transfer in a variety of simulated and benchmark data scenarios, including tabular, image, and spoken, and adversarial tasks. This is in stark contrast to the reference algorithms we compared to, most of which failed to transfer either forward or backward, or both, despite that many of them require quadratic space or time complexity.

URL: https://openreview.net/forum?id=DsUs9Ib9Bl

---

Title: Containing a spread through sequential learning: to exploit or to explore?

Abstract: The spread of an undesirable contact process, such as an infectious disease (e.g. COVID-19), is contained through testing and isolation of infected nodes. The temporal and spatial evolution of the process (along with containment through isolation) render such detection as fundamentally different from active search detection strategies. In this work, through an active learning approach, we design testing and isolation strategies to contain the spread and minimize the cumulative infections under a given test budget. We prove that the objective can be optimized, with performance guarantees, by greedily selecting the nodes to test. We further design reward-based methodologies that effectively minimize an upper bound on the cumulative infections and are computationally more tractable in large networks. These policies, however, need knowledge about the nodes' infection probabilities which are dynamically changing and have to be learned by sequential testing. We develop a message-passing framework for this purpose and, building on that, show novel tradeoffs between exploitation of knowledge through reward-based heuristics and exploration of the unknown through a carefully designed probabilistic testing. The tradeoffs are fundamentally distinct from the classical counterparts under active search or multi-armed bandit problems (MABs). We provably show the necessity of exploration in a stylized network and show through simulations that exploration can outperform exploitation in various synthetic and real-data networks depending on the parameters of the network and the spread.

URL: https://openreview.net/forum?id=qvRWcDXBam

---

Title: Agent-State Construction with Auxiliary Inputs

Abstract: In many, if not every realistic sequential decision-making task, the decision-making agent is not able to model the full complexity of the world. The environment is often much larger and more complex than the agent, a setting also known as partial observability. In such settings, the agent must leverage more than just the current sensory inputs; it must construct an agent state that summarizes previous interactions with the world. Currently, a popular approach for tackling this problem is to learn the agent-state function via a recurrent network from the agent's sensory stream as input. Many impressive reinforcement learning applications have instead relied on environment-specific functions to aid the agent's inputs for history summarization. These augmentations are done in multiple ways, from simple approaches like concatenating observations to more complex ones such as uncertainty estimates. Although ubiquitous in the field, these additional inputs, which we term auxiliary inputs, are rarely emphasized, and it is not clear what their role or impact is. In this work we explore this idea further, and relate these auxiliary inputs to prior classic approaches to state construction. We present a series of examples illustrating the different ways of using auxiliary inputs for reinforcement learning. We show that these auxiliary inputs can be used to discriminate between observations that would otherwise be aliased, leading to more expressive features that smoothly interpolate between different states. Finally, we show that this approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation through time, and acts as a heuristic that facilitates longer temporal credit assignment, leading to better performance.

URL: https://openreview.net/forum?id=RLYkyucU6k

---

Title: Graph Contrastive Learning with Cross-View Reconstruction

Abstract: Graph self-supervised learning is commonly taken as an effective framework to tackle the supervision shortage issue in the graph learning task. Among different existing graph self-supervised learning strategies, graph contrastive learning (GCL) has been one of the most prevalent approaches to this problem. Despite the remarkable performance those GCL methods have achieved, existing GCL methods that heavily depend on various manually designed augmentation techniques still struggle to alleviate the feature suppression issue without risking losing task-relevant information. Consequently, the learned representation is either brittle or unilluminating. In light of this, we introduce the Graph Contrastive Learning with Cross-View Reconstruction (GraphCV), which follows the information bottleneck principle to learn minimal yet sufficient representation from graph data. Specifically, GraphCV aims to elicit the predictive (useful for downstream instance discrimination) and other non-predictive features separately. Except for the conventional contrastive loss which guarantees the consistency and sufficiency of the representation across different augmentation views, we introduce a cross-view reconstruction mechanism to pursue the disentanglement of the two learned representations. Besides, an adversarial view perturbed from the original view is added as the third view for the contrastive loss to guarantee the intactness of the global semantics and improve the representation robustness. We empirically demonstrate that our proposed model outperforms the state-of-the-art on graph classification task over multiple benchmark datasets.

URL: https://openreview.net/forum?id=37lFb6hUuv

---

Reply all

Reply to author

Forward

0 new messages