Weekly TMLR digest for Feb 26, 2023

7 views

Skip to first unread message

TMLR

unread,

Feb 25, 2023, 7:00:10 PM2/25/23

to tmlr-annou...@googlegroups.com

New certifications
==================

Featured Certification: Workflow Discovery from Dialogues in the Low Data Regime

Amine El hattami, Issam H. Laradji, Stefania Raimondo, David Vazquez, Pau Rodriguez, Christopher Pal

https://openreview.net/forum?id=L9othQvPks

---

Accepted papers
===============

Title: OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing

Authors: Firat Ozdemir, Berkan Lafci, Xose Luis Dean-Ben, Daniel Razansky, Fernando Perez-Cruz

Abstract: Optoacoustic (OA) imaging is based on excitation of biological tissues with nanosecond-duration laser pulses followed by subsequent detection of ultrasound waves generated via light-absorption-mediated thermoelastic expansion. OA imaging features a powerful combination between rich optical contrast and high resolution in deep tissues. This enabled the exploration of a number of attractive new applications both in clinical and laboratory settings. However, no standardized datasets generated with different types of experimental set-up and associated processing methods are available to facilitate advances in broader applications of OA in clinical settings. This complicates an objective comparison between new and established data processing methods, often leading to qualitative results and arbitrary interpretations of the data. In this paper, we provide both experimental and synthetic OA raw signals and reconstructed image domain datasets rendered with different experimental parameters and tomographic acquisition geometries. We further provide trained neural networks to tackle three important challenges related to OA image processing, namely accurate reconstruction under limited view tomographic conditions, removal of spatial undersampling artifacts and anatomical segmentation for improved image reconstruction. Specifically, we define 44 experiments corresponding to the aforementioned challenges as benchmarks to be used as a reference for the development of more advanced processing methods.

URL: https://openreview.net/forum?id=BVi6MhKO0G

---

Title: Stacking Diverse Architectures to Improve Machine Translation

Authors: Andrea Schioppa, Nal Kalchbrenner

Abstract: Repeated applications of the same neural block primarily based on self-attention characterize the current state-of-the-art in neural architectures for machine translation. In such architectures the decoder adopts a masked version of the same encoding block. Although simple this strategy doesn't encode the various inductive biases such as locality that arise from alternative architectures and that are central to the modelling of translation. We propose Lasagna, an encoder-decoder model that aims to combine the inductive benefits of different architectures by layering multiple instances of different blocks. Lasagna’s encoder first grows the representation from local to mid-sized using convolutional blocks and only then applies a pair of final self-attention blocks. Lasagna’s decoder uses only convolutional blocks that attend to the encoder representation. On a large suit of machine translation tasks, we find that Lasagna not only matches or outperforms the Transformer baseline, but it does so more efficiently thanks to widespread use of the efficient convolutional blocks. These findings suggest that the widespread use of uniform architectures may be suboptimal in certain scenarios and exploiting the diversity of inductive architectural biases can lead to substantial gains.

URL: https://openreview.net/forum?id=mNEqiC924B

---

Title: Contrastive Search Is What You Need For Neural Text Generation

Authors: Yixuan Su, Nigel Collier

Abstract: Generating text with autoregressive language models (LMs) is of great importance to many natural language processing (NLP) applications. Previous solutions for this task often produce text that contains degenerative expressions (Welleck et al., 2020) or lacks semantic consistency (Basu et al., 2021). Recently, Su et al. (2022b) introduced a new decoding method, contrastive search, based on the isotropic representation space of the language model and obtained new state of the art on various benchmarks. Additionally, Su et al. (2022b) argued that the representations of autoregressive LMs (e.g. GPT-2) are intrinsically anisotropic which is also shared by previous studies (Ethayarajh, 2019). Therefore, to ensure the language model follows an isotropic distribution, Su et al. (2022b) proposed a contrastive learning scheme, SimCTG, which calibrates the language model’s representations through additional training.

In this study, we first answer the question: “Are autoregressive LMs really anisotropic?”. To this end, we extensively evaluate the isotropy of LMs across 16 major languages. Surprisingly, we find that the anisotropic problem only exists in the two specific English GPT-2-small/medium models. On the other hand, all other evaluated LMs are naturally isotropic which is in contrast to the conclusion drawn by previous studies (Ethayarajh, 2019; Su et al., 2022b). Based on our findings, we further assess the contrastive search decoding method using off-the-shelf LMs on four generation tasks across 16 languages. Our experimental results demonstrate that contrastive search significantly outperforms previous decoding methods without any additional training. More notably, on 12 out of the 16 evaluated languages, contrastive search performs comparably with human-level performances as judged by human evaluations.

URL: https://openreview.net/forum?id=GbkWw3jwL9

---

Title: Separable Self-attention for Mobile Vision Transformers

Authors: Sachin Mehta, Mohammad Rastegari

Abstract: Mobile vision transformers (MobileViT) can achieve state-of-the-art performance across several mobile vision tasks, including classification and detection. Though these models have fewer parameters, they have high latency as compared to convolutional neural network-based models. The main efficiency bottleneck in MobileViT is the multi-headed self-attention (MHA) in transformers, which requires $O(k^2)$ time complexity with respect to the number of tokens (or patches) $k$. Moreover, MHA requires costly operations (e.g., batch-wise matrix multiplication) for computing self-attention, impacting latency on resource-constrained devices. This paper introduces a separable self-attention method with linear complexity, i.e. $O(k)$. A simple yet effective characteristic of the proposed method is that it uses element-wise operations for computing self-attention, making it a good choice for resource-constrained devices. The improved model, MobileViTv2, is state-of-the-art on several mobile vision tasks, including ImageNet object classification and MS-COCO object detection. With about three million parameters, MobileViTv2 achieves a top-1 accuracy of 75.6% on the ImageNet dataset, outperforming MobileViT by about 1% while running $3.2\times$ faster on a mobile device. Our source code is available at: https://github.com/apple/ml-cvnets

URL: https://openreview.net/forum?id=tBl4yBEjKi

---

Title: A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated Classification

Authors: Alan Q. Wang, Mert R. Sabuncu

Abstract: In this paper, we empirically analyze a simple, non-learnable, and nonparametric Nadaraya-Watson (NW) prediction head that can be used with any neural network architecture. In the NW head, the prediction is a weighted average of labels from a support set. The weights are computed from distances between the query feature and support features. This is in contrast to the dominant approach of using a learnable classification head (e.g., a fully-connected layer) on the features, which can be challenging to interpret and can yield poorly calibrated predictions. Our empirical results on an array of computer vision tasks demonstrate that the NW head can yield better calibration with comparable accuracy compared to its parametric counterpart, particularly in data-limited settings. To further increase inference-time efficiency, we propose a simple approach that involves a clustering step run on the training set to create a relatively small distilled support set. Furthermore, we explore two means of interpretability/explainability that fall naturally from the NW head. The first is the label weights, and the second is our novel concept of the ``support influence function,'' which is an easy-to-compute metric that quantifies the influence of a support element on the prediction for a given query. As we demonstrate in our experiments, the influence function can allow the user to debug a trained model. We believe that the NW head is a flexible, interpretable, and highly useful building block that can be used in a range of applications.

URL: https://openreview.net/forum?id=iEq6lhG4O3

---

Title: Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models

Authors: Juan Lopez Alcaraz, Nils Strodthoff

Abstract: The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-ofthe-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.

URL: https://openreview.net/forum?id=hHiIbk7ApW

---

Title: Robust Hybrid Learning With Expert Augmentation

Authors: Antoine Wehenkel, Jens Behrmann, Hsiang Hsu, Guillermo Sapiro, Gilles Louppe, Joern-Henrik Jacobsen

Abstract: Hybrid modelling reduces the misspecification of expert models by combining them with machine learning (ML) components learned from data. Similarly to many ML algorithms, hybrid model performance guarantees are limited to the training distribution. Leveraging the insight that the expert model is usually valid even outside the training domain, we overcome this limitation by introducing a hybrid data augmentation strategy termed \textit{expert augmentation}. Based on a probabilistic formalization of hybrid modelling, we demonstrate that expert augmentation, which can be incorporated into existing hybrid systems, improves generalization. We empirically validate the expert augmentation on three controlled experiments modelling dynamical systems with ordinary and partial differential equations. Finally, we assess the potential real-world applicability of expert augmentation on a dataset of a real double pendulum.

URL: https://openreview.net/forum?id=oe4dl4MCGY

---

Title: Improved Differentially Private Riemannian Optimization: Fast Sampling and Variance Reduction

Authors: Saiteja Utpala, Andi Han, Pratik Jawanpuria, Bamdev Mishra

Abstract: A common step in differentially private ({DP}) Riemannian optimization is sampling from the (tangent) Gaussian distribution as noise needs to be generated in the tangent space to perturb the gradient. In this regard, existing works either use the Markov chain Monte Carlo ({MCMC}) sampling or explicit basis construction based sampling methods on the tangent space. This becomes a computational bottleneck in the practical use of {DP} Riemannian optimization, especially when performing stochastic optimization. In this paper, we discuss different sampling strategies and develop efficient sampling procedures by exploiting linear isometry between tangent spaces and show them to be orders of magnitude faster than both the {MCMC} and sampling using explicit basis construction. Furthermore, we develop the {DP} Riemannian stochastic variance reduced gradient algorithm and compare it with DP Riemannian gradient descent and stochastic gradient descent algorithms on various problems.

URL: https://openreview.net/forum?id=paguBNtqiO

---

Title: Dirichlet Mechanism for Differentially Private KL Divergence Minimization

Authors: Donlapark Ponnoprat

Abstract: Given an empirical distribution $f(x)$ of sensitive data $x$, we consider the task of minimizing $F(y) = D_{\text{KL}} (f(x)\Vert y)$ over a probability simplex, while protecting the privacy of $x$. We observe that, if we take the exponential mechanism and use the KL divergence as the loss function, then the resulting algorithm is the $Dirichlet\text{ }mechanism$ that outputs a single draw from a Dirichlet distribution. Motivated by this, we propose a Rényi differentially private (RDP) algorithm that employs the Dirichlet mechanism to solve the KL divergence minimization task. In addition, given $f(x)$ as above and $\hat{y}$ an output of the Dirichlet mechanism, we prove a probability tail bound on $D_{\text{KL}} (f(x)\Vert \hat{y})$, which is then used to derive a lower bound for the sample complexity of our RDP algorithm. Experiments on real-world datasets demonstrate advantages of our algorithm over Gaussian and Laplace mechanisms in supervised classification and maximum likelihood estimation.

URL: https://openreview.net/forum?id=lmr2WwlaFc

---

Title: Regularized Training of Intermediate Layers for Generative Models for Inverse Problems

Authors: Sean Gunn, Jorio Cocola, PAul HAnd

Abstract: Generative Adversarial Networks (GANs) have been shown to be powerful and flexible priors when solving inverse problems. One challenge of using them is overcoming representation error, the fundamental limitation of the network in representing any particular signal. Recently, multiple proposed inversion algorithms reduce representation error by optimizing over intermediate layer representations. These methods are typically applied to generative models that were trained agnostic of the downstream inversion algorithm. In our work, we introduce a principle that if a generative model is intended for inversion using an algorithm based on optimization of intermediate layers, it should be trained in a way that regularizes those intermediate layers. We instantiate this principle for two notable recent inversion algorithms: Intermediate Layer Optimization and the Multi-Code GAN prior. For both of these inversion algorithms, we introduce a new regularized GAN training algorithm and demonstrate that the learned generative model results in lower reconstruction errors across a wide range of under sampling ratios when solving compressed sensing, inpainting, and super-resolution problems.

URL: https://openreview.net/forum?id=cKsKXR28cG

---

Title: Workflow Discovery from Dialogues in the Low Data Regime

Authors: Amine El hattami, Issam H. Laradji, Stefania Raimondo, David Vazquez, Pau Rodriguez, Christopher Pal

Abstract: Text-based dialogues are now widely used to solve real-world problems. In cases where solution strategies are already known, they can sometimes be codified into workflows and used to guide humans or artificial agents through the task of helping clients. We introduce a new problem formulation that we call Workflow Discovery (WD) in which we are interested in the situation where a formal workflow may not yet exist. Still, we wish to discover the set of actions that have been taken to resolve a particular problem. We also examine a sequence-to-sequence (Seq2Seq) approach for this novel task. We present experiments where we extract workflows from dialogues in the Action-Based Conversations Dataset (ABCD). Since the ABCD dialogues follow known workflows to guide agents, we can evaluate our ability to extract such workflows using ground truth sequences of actions. We propose and evaluate an approach that conditions models on the set of possible actions, and we show that using this strategy, we can improve WD performance. Our conditioning approach also improves zero-shot and few-shot WD performance when transferring learned models to unseen domains within and across datasets. Further, on ABCD a modified variant of our Seq2Seq method achieves state-of-the-art performance on related but different problems of Action State Tracking (AST) and Cascading Dialogue Success (CDS) across many evaluation metrics.

URL: https://openreview.net/forum?id=L9othQvPks

---

Title: Layerwise Bregman Representation Learning of Neural Networks with Applications to Knowledge Distillation

Authors: Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K Warmuth

Abstract: We propose a new method for layerwise representation learning of a trained neural network that conforms to the non-linearity of the layer's transfer function. In particular, we form a Bregman divergence based on the convex function induced by the layer's transfer function and construct an extension of the original Bregman PCA formulation by incorporating a mean vector and revising the normalization constraint on the principal directions. These modifications allow exporting the learned representation as a fixed layer with a non-linearity. As an application to knowledge distillation, we cast the learning problem for the student network as predicting the compression coefficients of the teacher's representations, which is then passed as the input to the imported layer. Our empirical findings indicate that our approach is substantially more effective for transferring information between networks than typical teacher-student training that uses the teacher's soft labels.

URL: https://openreview.net/forum?id=6dsvH7pQHH

---

Title: Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural Networks

Authors: Vijaya Raghavan T Ramkumar, Elahe Arani, Bahram Zonooz

Abstract: Deep neural networks (DNNs) are often trained on the premise that the complete training data set is provided ahead of time. However, in real-world scenarios, data often arrive in chunks over time. This leads to important considerations about the optimal strategy for training DNNs, such as whether to fine-tune them with each chunk of incoming data (warm-start) or to retrain them from scratch with the entire corpus of data whenever a new chunk is available. While employing the latter for training can be resource-intensive, recent work has pointed out the lack of generalization in warm-start models. Therefore, to strike a balance between efficiency and generalization, we introduce "Learn, Unlearn, and Relearn (LURE)" an online learning paradigm for DNNs. LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model through weight reinitialization in a data-dependent manner, and the relearning phase, which emphasizes learning on generalizable features. We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings. We further show that it leads to more robust and well-calibrated models.

URL: https://openreview.net/forum?id=WN1O2MJDST

---

New submissions
===============

Title: Positive Difference Distribution based Image Outlier Detection using Normalizing Flows and Contrastive Data

Abstract: Detecting test data deviating from training data is a central problem for safe and robust machine learning. Likelihoods learned by a generative model, e.g., a normalizing flow via standard log-likelihood training, perform poorly as an outlier score. We propose to use an unlabelled auxiliary dataset and a probabilistic outlier score for outlier detection. We use a self-supervised feature extractor trained on the auxiliary dataset and train a normalizing flow on the extracted features by maximizing the likelihood on in-distribution data and minimizing the likelihood on the contrastive dataset. We show that this is equivalent to learning the normalized positive difference between the in-distribution and the contrastive feature density. We conduct experiments on benchmark datasets and compare to the likelihood, the likelihood ratio and state-of-the-art anomaly detection methods.

URL: https://openreview.net/forum?id=B4J40x7NjA

---

Title: Comparative Generalization Bounds for Deep Neural Networks

Abstract: In this work, we investigate the generalization capabilities of deep neural networks. We introduce a measure of the effective depth of neural networks, defined as the first layer at which sample embeddings are separable using the nearest-class center classifier. Our empirical results demonstrate that, in standard classification settings, neural networks trained using Stochastic Gradient Descent tend to have small effective depths. We also explore the relationship between effective depth, the complexity of the training dataset, and generalization. For instance, we find that the effective depth of a trained neural network increases as the number of random labels in the data increases. Additionally, we derive a generalization bound by comparing the effective depth of a network with the minimal depth required to fit the same dataset with partially corrupted labels. This bound provides non-vacuous predictions of test performance and is found to be independent of the actual depth of the network in our experiments.

URL: https://openreview.net/forum?id=162TqkUNPO

---

Title: Simple Non-Parametric Calibration of Multi-Class Classifiers

Abstract: A probabilistic classifier is considered calibrated if it outputs probabilities that are in correspondence with empirical class proportions. Calibration is essential in safety-critical tasks where small deviations between the predicted probabilities and the actually observed class
proportions can incur high costs. A common approach to improve the calibration of a classifier is to use a hold-out data set and a post-hoc calibration method to learn a correcting transformation for the classifier’s output. This work explores the field of post-hoc calibration methods for multi-class classifiers and proposes a novel non-parametric post-hoc calibration method. The basis of the proposed method is the assumption of locally equal calibration errors on the probability simplex. This assumption has been previously used but never clearly stated in the calibration literature. The proposed calibration method is shown to offer improvements to the state-of-the-art according to the expected calibration error metric on CIFAR-10 and CIFAR-100 data sets.

URL: https://openreview.net/forum?id=na5sHG69rI

---

Title: Feature Selection for Huge Data via Minipatch Learning

Abstract: Feature selection often leads to increased model interpretability, faster computation, and improved model performance by discarding irrelevant or redundant features. While feature selection is a well-studied problem with many widely-used techniques, there are typically two key challenges: i) many existing approaches can become computationally intractable in huge-data settings on the order of millions of features; and ii) the statistical accuracy of selected features often degrades in high-dimensional, high-noise, and high-correlation settings, thus hindering reliable model interpretation. In this work, we tackle these problems by developing Stable Minipatch Selection (STAMPS) and Adaptive STAMPS (AdaSTAMPS). These are meta-algorithms that build ensembles of selection events of base feature selectors trained on many tiny, random or adaptively-chosen subsets of both the observations and features of the data, which are named minipatches. Our approaches are general and can be employed with a variety of existing feature selection strategies and machine learning techniques in practice. In addition, we empirically demonstrate that our approaches, especially AdaSTAMPS, outperform many competing methods in terms of feature selection accuracy and computational time in a variety of numerical experiments; we also show the efficacy of our method in challenging high-dimensional settings common with biological data. Our methods are implemented in the Python package minipatch-learning.

URL: https://openreview.net/forum?id=GbIhy0oFec

---

Title: Black-Box Batch Active Learning for Regression

Abstract: Batch active learning is a popular approach for efficiently training machine learning models on large, initially unlabelled datasets, which repeatedly acquires labels for a batch of data points. However, many recent batch active learning methods are white-box approaches limited to differentiable parametric models: they score unlabeled points using acquisition functions based on model embeddings or first- and second-order derivatives. In this paper, we propose black-box batch active learning for regression tasks as an extension of white-box approaches. This approach is compatible with a wide range of machine learning models including regular and Bayesian deep learning models and non-differentiable models such as random forests. It is rooted in Bayesian principles and utilizes recent kernel-based approaches. Importantly, our method only relies on model predictions. This allows us to extend a wide range of existing state-of-the-art white-box batch active learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.

URL: https://openreview.net/forum?id=fvEvDlKko6

---

Title: Evaluation of Causal Inference Models to Access Heterogeneous Treatment Effect

Abstract: Causal inference has gained popularity over the last years due to the ability to see through
correlation and find causal relationship between covariates. There are a number of methods
that were created to this end, but there is not a systematic benchmark between those
methods, including the benefits and drawbacks of using each one of them. This research
compares a number of those methods on how well they access the heterogeneous treatment
effect using a variety of synthetically created data sets, divided between low-dimensional
and high-dimensional covariates and increasing complexity between the covariates and the
target. We compare the error between those method and discuss in which setting and
premises each method is better suited.

URL: https://openreview.net/forum?id=75NtszAm76

---

Title: Fast Treatment Personalization with Latent Bandits in Fixed-Confidence Pure Exploration

Abstract: Personalizing treatments for patients often involves a period of trial-and-error search until an optimal choice is found. To minimize suffering and other costs, it is critical to make this process as short as possible. When treatments have primarily short-term effects, search can be performed with multi-armed bandits (MAB), but these typically require long exploration periods to guarantee optimality. In this work, we design MAB algorithms which provably identify optimal treatments quickly by leveraging prior knowledge of the types of decision processes (patients) we can encounter, in the form of a latent variable model. We present two algorithms, the Latent LP-based Track and Stop (LLPT) explorer and the Divergence Explorer for this setting: fixed-confidence pure-exploration latent bandits. We give a lower bound on the stopping time of any algorithm which is correct at a given certainty level, and prove that the expected stopping time of the LLPT Explorer matches the lower bound in the high-certainty limit. Finally, we present results from an experimental study based on realistic simulation data for Alzheimer's disease, demonstrating that our formulation and algorithms lead to a significantly reduced stopping time.

URL: https://openreview.net/forum?id=NNRIGE8bvF

---

Title: Robust Constrained Reinforcement Learning

Abstract: Constrained reinforcement learning is to maximize the reward subject to constraints on utilities/costs. However, in practice it is often the case that the training environment is not the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation in the test environment. To address this challenge, we formulate the framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not fixed but lies in some uncertainty set. The goal is twofold: 1) to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and 2) to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further develop theoretical guarantees on its convergence, complexity and robust feasibility. We then investigate a concrete example of $\delta$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.

URL: https://openreview.net/forum?id=tfTPNZd48H

---

Title: Learning Online Data Association

Abstract: When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people. To build up a coherent, low-variance estimate of the underlying state, it is necessary to fuse information from multiple detections over time. To do this fusion, the agent must decide which detections to associate with one another. We address this data-association problem in the setting of an online filter, in which each observation is processed by aggregating into an existing object hypothesis. Classic methods with strong probabilistic foundations exist, but they are computationally expensive and require models that can be difficult to acquire. In this work, we use the deep-learning tools of sparse attention and representation learning to learn a machine that processes a stream of detections and outputs a set of hypotheses about objects in the world. We evaluate this approach on simple clustering problems, problems with dynamics, and complex image-based domains. We find that it generalizes well from short to long observation sequences and from a few to many hypotheses, outperforming other learning approaches and classical non-learning methods.

URL: https://openreview.net/forum?id=iW9lN8EGLw

---

Title: Subgraph Permutation Equivariant Networks

Abstract: In this work we develop a new method, named Sub-graph Permutation Equivariant Networks (SPEN), which provides a framework for building graph neural networks that operate on sub-graphs, while using a base update function that is permutation equivariant, that are equivariant to a novel choice of automorphism group. Message passing neural networks have been shown to be limited in their expressive power and recent approaches to over come this either lack scalability or require structural information to be encoded into the feature space. The general framework presented here overcomes the scalability issues associated with global permutation equivariance by operating more locally on sub-graphs. In addition, through operating on sub-graphs the expressive power of higher-dimensional global permutation equivariant networks is improved; this is due to fact that two non-distinguishable graphs often contain distinguishable sub-graphs. Furthermore, the proposed framework only requires a choice of $k$-hops for creating ego-network sub-graphs and a choice of representation space to be used for each layer, which makes the method easily applicable across a range of graph based domains. We experimentally validate the method on a range of graph benchmark classification tasks, demonstrating statistically indistinguishable results from the state-of-the-art on six out of seven benchmarks. Further, we demonstrate that the use of local update functions offers a significant improvement in GPU memory over global methods.

URL: https://openreview.net/forum?id=3agxS3aDUs

---

Title: Soft Diffusion: Score Matching with General Corruptions

Abstract: We define a broader family of corruption processes that generalizes previously known diffusion models. To reverse these general diffusions, we propose a new objective called Soft Score Matching. Soft Score Matching incorporates the degradation process in the network and provably learns the score function for any linear corruption process. Our new loss trains the model to predict a clean image, that after corruption, matches the diffused observation. This objective learns the gradient of the likelihood under suitable regularity conditions for the family of linear corruption processes. We further develop an algorithm to select the corruption levels for general diffusion processes and a novel sampling method that we call Momentum Sampler. We show experimentally that our framework works for general linear corruption processes, such as Gaussian blur and masking. Our method outperforms all linear diffusion models on CelebA-64 achieving FID score 1.85. We also show computational benefits compared to vanilla denoising diffusion.

URL: https://openreview.net/forum?id=W98rebBxlQ

---

Title: Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Abstract: Pool-based Active Learning (AL) has successfully minimized labeling costs by sequentially selecting the most informative, unlabeled data from a large, unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling schemes might not work well under out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains data samples that do not belong to the pre-defined categories of the target task. Achieving good AL performance under OOD data scenarios is challenging due to the natural conflict between AL sampling strategies and OOD data detection. For instance, both more informative in-distribution (ID) data and OOD data in an unlabeled data pool would be assigned high informativeness scores (e.g., high entropy) during AL processes. To alleviate this dilemma, we propose a Monte-Carlo Pareto Optimization for Active Learning (POAL) sampling scheme, which selects optimal subsets of unlabeled samples with \emph{fixed batch size} from the unlabeled data pool. We cast the AL sampling task as a multi-objective optimization problem and utilize Pareto optimization based on two conflicting objectives: (1) the typical AL sampling scheme (e.g., maximum entropy) and (2) the confidence of not being an OOD data sample. Experimental results show the effectiveness of our POAL on classical Machine Learning (ML) and Deep Learning (DL) tasks.

URL: https://openreview.net/forum?id=dXnccpSSYF

---

Title: When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making

Abstract: As machine learning (ML) models are increasingly being employed to assist human decision makers, it becomes critical to provide these decision makers with relevant inputs which can help them decide if and how to incorporate model predictions into their decision making. For instance, communicating the uncertainty associated with model predictions could potentially be helpful in this regard. In this work, we carry out user studies to systematically assess how people with differing levels of expertise respond to different types of predictive uncertainty i.e., posterior predictive distributions with different shapes and variances, in the context of ML assisted decision making. Our results demonstrate that showing uncertainty information leads to smaller disagreements with the machine, regardless of the type of uncertainty, but that these effects are sensitive to expertise in both ML and the domain. This suggests that posterior predictive distributions can potentially serve as useful decision aids which should be used with caution and take into account the type of uncertainty distribution and the expertise of the human.

URL: https://openreview.net/forum?id=pbs22kJmEO

---

Title: Private GANs, Revisited

Abstract: We show that the canonical approach for training differentially private GANs -- updating the discriminator with differentially private stochastic gradient descent (DPSGD) -- can yield significantly improved results after modifications to training. Existing instantiations of this approach neglect to consider how adding noise only to discriminator updates disrupts the careful balance between the generator and discriminator necessary for successful GAN training. We show that a simple fix -- taking more discriminator steps between generator steps -- restores parity and improves results. Additionally, with the goal of restoring parity between the generator and discriminator, we experiment with other modifications to improve discriminator training and see further improvements in generation quality. Our results demonstrate that on standard benchmarks, DPSGD outperforms all alternative GAN privatization schemes.

URL: https://openreview.net/forum?id=9sVCIngrhP

---

Title: An Adaptive Half-Space Projection Method for Stochastic Optimization Problems with Group Sparse Regularization

Abstract: Optimization problems with group sparse regularization are ubiquitous in various popular downstream applications, such as feature selection and compression for Deep Neural Networks (DNNs). Nonetheless, the existing methods in the literature do not perform particularly well when such regularization is used in combination with a stochastic loss function. In particular, it is challenging to design an algorithm that is computationally efficient, has a convergence guarantee, and is able to compute group-sparse solutions. Recently, a half-space stochastic projected gradient ({\tt HSPG}) method was proposed that partly addressed these challenges. In this paper, we present a substantially enhanced version of {\tt HSPG} that we call~{\tt AdaHSPG+} that makes two noticeable advances. First, {\tt AdaHSPG+} is shown to have a stronger convergence result under significantly looser assumptions than those required by {\tt HSPG}. This improvement in convergence is achieved by integrating variance reduction techniques with a new adaptive strategy for iteratively predicting the support of a solution. Second, {\tt AdaHSPG+} requires significantly less parameter tuning compared to {\tt HSPG}, thus making it more practical and user friendly. This advance is achieved by designing automatic and adaptive strategies for choosing the type of step employed in each iteration and for updating key hyperparameters. The numerical effectiveness of our proposed {\tt AdaHSPG+} algorithm is demonstrated on both convex and non-convex benchmark problems.

URL: https://openreview.net/forum?id=KBhSyBBeeO

---

Title: Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and Stability

Abstract: Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world. Designing RL algorithms that optimize these objectives can be a costly and painstaking process. This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions. MetaPG explicitly optimizes for generalizability and performance, and implicitly optimizes the stability of both metrics. We initialize our loss function population with Soft Actor-Critic (SAC) and perform multi-objective optimization using fitness metrics encoding single-task performance, zero-shot generalizability to unseen environment configurations, and stability across independent runs with different random seeds. On a set of continuous control tasks from the Real-World RL Benchmark Suite, we find that our method, using a single environment during evolution, evolves algorithms that improve upon SAC's performance and generalizability by 4% and 20%, respectively, and reduce instability up to 67%. Then, we scale up to more complex environments from the Brax physics simulator and replicate generalizability tests encountered in practical settings, such as different friction coefficients. MetaPG evolves algorithms that can obtain 10% better generalizability without loss of performance within the same meta-training environment and obtain similar results to SAC when doing cross-domain evaluations in other Brax environments. The evolution results are interpretable; by analyzing the structure of the best algorithms we identify elements that help optimizing certain objectives, such as regularization terms for the critic loss.

URL: https://openreview.net/forum?id=zTklDjybrK

---

Reply all

Reply to author

Forward

0 new messages