Weekly TMLR digest for Oct 16, 2022

3 views
Skip to first unread message

TMLR

unread,
Oct 15, 2022, 8:00:09 PM10/15/22
to tmlr-annou...@googlegroups.com

Accepted papers
===============


Title: Simplifying Node Classification on Heterophilous Graphs with Compatible Label Propagation

Authors: Zhiqiang Zhong, Sergei Ivanov, Jun Pang

Abstract: Graph Neural Networks (GNNs) have been predominant for graph learning tasks; however, recent studies showed that a well-known graph algorithm, Label Propagation (LP), combined with a shallow neural network can achieve comparable performance to GNNs in semi-supervised node classification on graphs with high homophily. In this paper, we show that this approach falls short on graphs with low homophily, where nodes often connect to the nodes of the opposite classes. To overcome this, we carefully design a combination of a base predictor with LP algorithm that enjoys a closed-form solution as well as convergence guarantees. Our algorithm first learns the class compatibility matrix and then aggregates label predictions using LP algorithm weighted by class compatibilities. On a wide variety of benchmarks, we show that our approach achieves the leading performance on graphs with various levels of homophily. Meanwhile, it has orders of magnitude fewer parameters and requires less execution time.

URL: https://openreview.net/forum?id=JBuCfkmKYu

---

Title: Centroids Matching: an efficient Continual Learning approach operating in the embedding space

Authors: Jary Pomponi, Simone Scardapane

Abstract: Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.

URL: https://openreview.net/forum?id=7gzQltQSwr

---

Title: Nonstationary Reinforcement Learning with Linear Function Approximation

Authors: Huozhi Zhou, Jinglin Chen, Lav R. Varshney, Ashish Jagmohan

Abstract: We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. Specifically, both the reward and state transition functions can evolve over time but their total variations do not exceed a \textit{variation budget}. We first develop $\texttt{LSVI-UCB-Restart}$ algorithm, an optimistic modification of least-squares value iteration with periodic restart, and bound its dynamic regret when variation budgets are known. Then we propose a parameter-free algorithm \texttt{Ada-LSVI-UCB-Restart} that extends to unknown variation budgets. We also derive the first minimax dynamic regret lower bound for nonstationary linear MDPs and as a byproduct establish a minimax regret lower bound for linear MDPs unsolved by Jin et al. (2020). Finally, we provide numerical experiments to demonstrate the effectiveness of our proposed algorithms.

URL: https://openreview.net/forum?id=nS8A9nOrqp

---

Title: Enhanced gradient-based MCMC in discrete spaces

Authors: Benjamin Rhodes, Michael U. Gutmann

Abstract: The recent introduction of gradient-based Markov chain Monte Carlo (MCMC) for discrete spaces holds great promise, and comes with the tantalising possibility of new discrete counterparts to celebrated continuous methods such as the Metropolis-adjusted Langevin algorithm (MALA). Towards this goal, we introduce several discrete Metropolis-Hastings samplers that are conceptually inspired by MALA, and demonstrate their strong empirical performance across a range of challenging sampling problems in Bayesian inference and energy-based modelling. Methodologically, we identify why discrete analogues to \emph{preconditioned} MALA are generally intractable, motivating us to introduce a new kind of preconditioning based on auxiliary variables and the `Gaussian integral trick'.

URL: https://openreview.net/forum?id=j2Mid5hFUJ

---

Title: Flipped Classroom: Effective Teaching for Time Series Forecasting

Authors: Philipp Teutsch, Patrick Mäder

Abstract: Sequence-to-sequence models based on LSTM and GRU are a most popular choice for forecasting time series data reaching state-of-the-art performance. Training such models can be delicate though. The two most common training strategies within this context are teacher forcing (TF) and free running (FR). TF can be used to help the model to converge faster but may provoke an exposure bias issue due to a discrepancy between training and inference phase. FR helps to avoid this but does not necessarily lead to better results, since it tends to make the training slow and unstable instead. Scheduled sampling was the first approach tackling these issues by picking the best from both worlds and combining it into a curriculum learning (CL) strategy. Although scheduled sampling seems to be a convincing alternative to FR and TF, we found that, even if parametrized carefully, scheduled sampling may lead to premature termination of the training when applied for time series forecasting. To mitigate the problems of the above approaches we formalize CL strategies along the training as well as the training iteration scale. We propose several new curricula, and systematically evaluate their performance in two experimental sets. For our experiments, we utilize six datasets generated from prominent chaotic systems. We found that the newly proposed increasing training scale curricula with a probabilistic iteration scale curriculum consistently outperforms previous training strategies yielding an NRMSE improvement of up to 81% over FR or TF training. For some datasets we additionally observe a reduced number of training iterations. We observed that all models trained with the new curricula yield higher prediction stability allowing for longer prediction horizons.

URL: https://openreview.net/forum?id=w3x20YEcQK

---

Title: Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Authors: Mohammad Ali Alomrani, Reza Moravej, Elias Boutros Khalil

Abstract: The challenge in the widely applicable online matching problem lies in making irrevocable assignments while there is uncertainty about future inputs. Most theoretically-grounded policies are myopic or greedy in nature. In real-world applications where the matching process is repeated on a regular basis, the underlying data distribution can be leveraged for better decision-making. We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data. We devise a set of neural network architectures, design feature representations, and empirically evaluate them across two online matching problems: Edge-Weighted Online Bipartite Matching and Online Submodular Bipartite Matching. We show that most of the learning approaches perform consistently better than classical baseline algorithms on four synthetic and real-world datasets. On average, our proposed models improve the matching quality by 3-10% on a variety of synthetic and real-world datasets.


URL: https://openreview.net/forum?id=mbwm7NdkpO

---

Title: Generative Adversarial Neural Operators

Authors: Md Ashiqur Rahman, Manuel A Florez, Anima Anandkumar, Zachary E Ross, Kamyar Azizzadenesheli

Abstract: We propose the generative adversarial neural operator (GANO), a generative model paradigm for learning probabilities on infinite-dimensional function spaces. The natural sciences and engineering are known to have many types of data that are sampled from infinite- dimensional function spaces, where classical finite-dimensional deep generative adversarial networks (GANs) may not be directly applicable. GANO generalizes the GAN framework and allows for the sampling of functions by learning push-forward operator maps in infinite-dimensional spaces. GANO consists of two main components, a generator neural operator and a discriminator neural functional. The inputs to the generator are samples of functions from a user-specified probability measure, e.g., Gaussian random field (GRF), and the generator outputs are synthetic data functions. The input to the discriminator is either a real or synthetic data function. In this work, we instantiate GANO using the Wasserstein criterion and show how the Wasserstein loss can be computed in infinite-dimensional spaces. We empirically study GANO in controlled cases where both input and output functions are samples from GRFs and compare its performance to the finite-dimensional counterpart GAN. We empirically study the efficacy of GANO on real-world function data of volcanic activities and show its superior performance over GAN.

URL: https://openreview.net/forum?id=X1VzbBU6xZ

---

Title: Lookback for Learning to Branch

Authors: Prateek Gupta, Elias Boutros Khalil, Didier Chételat, Maxime Gasse, Andrea Lodi, Yoshua Bengio, M. Pawan Kumar

Abstract: The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound B&B solvers. These GNNs are trained, offline and on a collection of MILPs, to imitate a very good but computationally expensive branching heuristic, strong branching. Given that B&B results in a tree of sub-MILPs, we ask (a) whether there are strong dependencies exhibited by the target heuristic among the neighboring nodes of the B&B tree, and (b) if so, whether we can incorporate them in our training procedure. Specifically, we find that with the strong branching heuristic, a child node's best choice was often the parent's second-best choice. We call this the "lookback" phenomenon. Surprisingly, the typical branching GNN of Gasse et al. (2019) often misses this simple "answer". To imitate the target behavior more closely by incorporating the lookback phenomenon in GNNs, we propose two methods: (a) target smoothing for the standard cross-entropy loss function, and (b) adding a Parent-as-Target (PAT) Lookback regularizer term. Finally, we propose a model selection framework to incorporate harder-to-formulate objectives such as solving time in the final models. Through extensive experimentation on standard benchmark instances, we show that our proposal results in up to 22% decrease in the size of the B&B tree and up to 15% improvement in the solving times.

URL: https://openreview.net/forum?id=EQpGkw5rvL

---

Title: From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality

Authors: Fusheng Liu, Haizhao Yang, Soufiane Hayou, Qianxiao Li

Abstract:
Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyz- ing the generalization error based on the optimization trajectory under the gradient flow algorithm. The key ingredient of this framework is the Uniform-LGI, a property that is generally satisfied when training machine learning models. Leveraging the Uniform-LGI, we first derive convergence rates for gradient flow algorithm, then we give generalization bounds for a large class of machine learning models. We further apply our framework to three distinct machine learning models: linear regression, kernel regression, and two-layer neural networks. Through our approach, we obtain generalization estimates that match or extend previous results.

URL: https://openreview.net/forum?id=mW6nD3567x

---

Title: Unimodal Likelihood Models for Ordinal Data

Authors: Ryoya Yamasaki

Abstract: Ordinal regression (OR) is the classification of ordinal data, in which the underlying target variable is categorical and considered to have a natural ordinal relation for the explanatory variables. In this study, we suppose the unimodality of the conditional probability distribution of the target variable given a value of the explanatory variables as a natural ordinal relation of the ordinal data. Under this supposition, unimodal likelihood models are considered to be promising for achieving good generalization performance in OR tasks. Demonstrating that previous unimodal likelihood models have a weak representation ability, we thus develop more representable unimodal likelihood models, including the most representable one. OR experiments in this study showed that the developed more representable unimodal likelihood models could yield better generalization performance for real-world ordinal data compared with previous unimodal likelihood models and popular statistical OR models having no unimodality guarantee.

URL: https://openreview.net/forum?id=1l0sClLiPc

---


New submissions
===============


Title: Anomaly Detection using Contrastive Normalizing Flows

Abstract: Detecting test data deviating from training data is a central problem for safe and robust machine learning. Likelihoods learned by a generative model, e.g., a normalizing flow via standard log-likelihood training, perform poorly as an anomaly score. We propose to use an unlabelled auxiliary dataset and a probabilistic outlier score for anomaly detection. We use a self-supervised feature extractor trained on the auxiliary dataset and train a normalizing flow on the extracted features by maximizing the likelihood on in-distribution data and minimizing the likelihood on the auxiliary dataset. We show that this is equivalent to learning the normalized positive difference between the in-distribution and the auxiliary feature density. We conduct experiments on benchmark datasets and show a robust improvement compared to likelihood, likelihood ratio methods and state-of-the-art anomaly detection methods.

URL: https://openreview.net/forum?id=IHjNxOm6zK

---

Title: Local Function Complexity for Active Learning via Mixture of Gaussian Processes

Abstract: Inhomogeneities in real-world data, e.g., due to changes in the observation noise level or variations in the structural complexity of the source function, pose a unique set of challenges for statistical inference. Accounting for them can greatly improve predictive power when physical resources or computation time is limited. In this paper, we draw on recent theoretical results on the estimation of local function complexity (LFC), derived from the domain of local polynomial smoothing (LPS), to establish a notion of local structural complexity, which is used to develop a model-agnostic active learning framework. Due to its reliance on pointwise estimates, the LPS model class is not robust and scalable with respect to large input space dimensions that typically come along with real-world problems. Here, we propose a GPR-based estimate of LFC, which is able to manage the curse of dimensionality. To this end, we train a mixture of experts (MoE) model where the experts are GPR models at different bandwidths. Being the key ingredient in the calculation of LFC, we then estimate locally optimal kernel bandwidths as the weighted average of these bandwidth candidates, where the weights are taken from the learned gate of the MoE model. We assess the effectiveness of our LFC estimate in an active learning application on a prototypical low-dimensional synthetic dataset, before taking on the challenging real-world task of reconstructing a quantum chemical force field for a small organic molecule and demonstrating state-of-the-art performance at a lower rate of sampling.

URL: https://openreview.net/forum?id=NxnBXFzez3

---

Title: UncertaINR: Uncertainty Quantification of End-to-End Implicit Neural Representations for Computed Tomography

Abstract: Implicit neural representations (INRs) have achieved impressive results for scene reconstruction and computer graphics, where their performance has primarily been assessed on reconstruction accuracy. As INRs make their way into other domains, where model predictions inform high-stakes decision-making, uncertainty quantification of INR inference is becoming critical. To that end, we study a Bayesian reformulation of INRs, UncertaINR, in the context of computed tomography, and evaluate several Bayesian deep learning implementations in terms of accuracy and calibration. We find that they achieve well-calibrated uncertainty, while retaining accuracy competitive with other classical, INR-based, and CNN-based reconstruction techniques. In contrast to the best-performing prior approaches, UncertaINR does not require a large training dataset, but only a handful of validation images.

URL: https://openreview.net/forum?id=jdGMBgYvfX

---

Title: Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

Abstract: In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches have been proposed for training ensembles, and conflicting views prevail with regards to the importance of various ingredients of these approaches. In this paper, we aim to address the benefits of two ingredients -- prior functions and bootstrapping -- which have come into question. We show that prior functions can significantly improve an ensemble agent's joint predictions across inputs and that bootstrapping affords additional benefits if the signal-to-noise ratio varies across inputs. Our claims are justified by both theoretical and experimental results.


URL: https://openreview.net/forum?id=IqJsyulDUX

---

Title: Word Embeddings as Statistical Estimators

Abstract: Word embeddings are a fundamental tool in natural language processing. Currently, word embedding methods are evaluated on the basis of empirical performance on benchmark data sets, and there is a lack of rigorous understanding of their theoretical properties. This paper studies word embeddings from the theoretical perspective of statistical inference, which is essential for formal inference and uncertainty quantification. We propose a copula-based statistical model for text data and show that under this model, the now-classical Word2Vec method can be interpreted as a statistical estimation method for estimating the theoretical pointwise mutual information (PMI). Next, by building on the work of Levy & Goldberg (2014), we develop a missing value-based estimator as a statistically tractable and interpretable alternative to the Word2Vec approach. The estimation error of this estimator is comparable to Word2Vec and improves upon the truncation-based method proposed by Levy & Goldberg (2014). The proposed estimator also compares favorably with Word2Vec in a benchmark sentiment analysis task on the IMDb Movie Reviews data set.

URL: https://openreview.net/forum?id=w8s4rtzMVQ

---

Title: Indiscriminate Data Poisoning Attacks on Neural Networks

Abstract: Data poisoning attacks, in which a malicious adversary aims to influence a model by injecting ``poisoned'' data into the training process, have attracted significant recent attention. In this work, we take a closer look at existing poisoning attacks and connect them with old and new algorithms for solving sequential Stackelberg games. By choosing an appropriate loss function for the attacker and optimizing with algorithms that exploit second-order information, we design poisoning attacks that are effective on neural networks. We present efficient implementations by parameterizing the attacker and allowing simultaneous and coordinated generation of tens of thousands of poisoned points, in contrast to existing methods that generate poisoned points one by one. We further perform extensive experiments that empirically explore the effect of data poisoning attacks on deep neural networks. Our paper sets a new benchmark on the possibility of performing indiscriminate data poisoning attacks on modern neural networks.

URL: https://openreview.net/forum?id=x4hmIsWu7e

---

Title: Bridging performance gap between minimal and maximal SVM models

Abstract: Support vector machine models are typically built using all possible pairs of SVM in one-against-one fashion. This requires too much computation for datasets with hundreds or thousands of classes, which motivates the search for multi-class models with a smaller number of edges in the underlying model graph. We conduct experiments to uncover metricàl and topological properties that impact the accuracy of a multi-class SVM model. Based on these results we propose two ways to build smaller multi-class SVM models. The key insight is that for model graphs of diameter two, we can estimate missing pairwise probabilities from known ones thus transforming the computation of posteriors to the usual complete (maximal) case. The first approach is incremental starting from a star graph. The second approach uses complete bipartite graphs. These approaches allow one to reduce computational effort by 50-90% while keeping accuracy near, or even above that of a complete model. Finally, we study how the choice of the coupling method impacts classification errors. We find that the currently used method of Wu-Lin-Weng relies mostly on the decision of the so-called critical SVM, similarly to a newly proposed stratified coupling method. This provides a partial explanation for its success in practice and suggests that it is a suitable method for element-wise ensemble improvement. All experiments are done using convolutional data sets, which have multiple advantages for benchmarking methodology to build multiclass SVM models.

URL: https://openreview.net/forum?id=SM1BkjGePI

---

Title: Beyond Information Gain: An Empirical Benchmark for Low-Switching-Cost Reinforcement Learning

Abstract: A ubiquitous requirement in many practical reinforcement learning (RL) applications is that the deployed policy that actually interacts with the environment cannot change frequently. Such an RL setting is called low-switching-cost RL, i.e., achieving the highest reward while reducing the number of policy switches during training. It has been a recent trend in theoretical RL research to develop provably efficient RL algorithms with low switching cost. The core idea in these theoretical works is to measure the information gain and switch the policy when the information gain is doubled. Despite of the theoretical advances, none of existing approaches have been validated empirically. We conduct the first empirical evaluation of different policy switching criteria on popular RL testbeds, including a medical treatment environment, the Atari games, and robotic control tasks. Surprisingly, although information-gain-based methods do recover the optimal rewards, they often lead to a substantially higher switching cost. By contrast, we find that a feature-based criterion, which has been largely ignored in the theoretical research, consistently produces the best performances over all the domains. We hope our benchmark could bring insights to the community and inspire future research. Our code and complete results can be found at https: // sites. google. com/ view/ low-switching-cost-rl

URL: https://openreview.net/forum?id=Xq1sTZTQVm

---

Title: Evaluation of Generative Models: An Empirical Study

Abstract: Implicit generative models, which do not return likelihood values, such as generative adversarial networks and diffusion models, have become prevalent in recent years.
While it’s true that these models have shown remarkable results, evaluating their performance is challenging. This issue is of vital importance to push research forward and identify meaningful gains from random noise.
Currently, heuristic metrics such as the Inception score (IS) and Frechet Inception Distance (FID) are the most common evaluation metrics, but what they measure is not entirely clear. Additionally, there are questions regarding how meaningful their score actually is. In this work, we propose a novel evaluation protocol for likelihood-based generative models, based on generating a high-quality synthetic dataset on which we can estimate classical metrics for comparison. Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably making them problematic when used for fine-grained comparison. We further used this experimental setting to study which evaluation metric best correlates with our probabilistic metrics. Lastly, we also address some of the issues with FID score by investigating the features used for this metric.

URL: https://openreview.net/forum?id=feGwuAUPKy

---

Title: MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Abstract: This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel.
We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called \textit{Multi-Agent Cooperative Recurrent Proximal Policy Optimization} (MACRPO).
We propose two novel ways of integrating information across agents and time in MACRPO:
First, we use a recurrent layer in critic's network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer.
This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability.
Second, we propose a new advantage function that incorporates other agents' rewards and value functions by controlling the level of cooperation between agents using a parameter.
The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other.
We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, and MADDPG and also single-agent methods with shared parameters between agents such as IMPALA and APEX.
The results show superior performance against other algorithms. The code is available online at \url{https://github.com/kargarisaac/macrpo}.

URL: https://openreview.net/forum?id=FT4PSqr2Gi

---

Title: Unsupervised Mismatch Localization in Cross-Modal Sequential Data

Abstract: Content mismatch usually occurs when data from one modality is translated to another, e.g. language learners producing mispronunciations (errors in speech) when reading a sentence (target text) aloud. However, most existing alignment algorithms assume that the content involved in the two modalities is perfectly matched, thus leading to difficulty in locating such mismatch between speech and text. In this work, we develop an unsupervised learning algorithm that can infer the relationship between content-mismatched cross-modal sequential data, especially for speech-text sequences. More specifically, we propose a hierarchical Bayesian deep learning model, dubbed mismatch localization variational autoencoder (ML-VAE), which decomposes the generative process of the speech into hierarchically structured latent variables, indicating the relationship between the two modalities. Training such a model is very challenging due to the discrete latent variables with complex dependencies involved. To address this challenge, we propose a novel and effective training procedure that alternates between estimating the hard assignments of the discrete latent variables over a specifically designed mismatch localization finite-state acceptor (ML-FSA) and updating the parameters of neural networks. Our experimental results show that ML-VAE successfully locates the mismatch between text and speech, without the need for human annotations for model training.

URL: https://openreview.net/forum?id=29V0xo7jKp

---

Title: Improved Differentially Private Riemannian Optimization

Abstract: A common step in differentially private ({DP}) Riemannian optimization is sampling from the (tangent) Gaussian distribution as noise needs to be generated in the tangent space to perturb the gradient before taking the step. In this regard, existing works either use the Markov chain Monte Carlo ({MCMC}) sampling or explicit basis construction based sampling methods on the tangent space. This becomes a computational bottleneck in the practical use of {DP} Riemannian optimization, especially when performing stochastic optimization. In this paper, we discuss different sampling strategies and develop efficient sampling procedures by exploiting linear isometry between tangent spaces and show them to be orders of magnitude faster than standard sampling strategies like {MCMC}. We also improve utility bounds by showing them to be metric-tensor independent. Furthermore, we develop the {DP} Riemannian stochastic variance reduced gradient algorithm and compare it with DP Riemannian gradient descent and stochastic gradient descent algorithms on various problems.

URL: https://openreview.net/forum?id=paguBNtqiO

---

Title: Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces

Abstract: Data-driven machine learning models are being increasingly employed in several important inference problems in biology, chemistry, and physics which require learning over combinatorial spaces. Recent empirical evidence (see, e.g., Tseng et al. (2020); Aghazadeh et al. (2021); Ha et al. (2021)) suggests that regularizing the spectral representation of such models improves their generalization power when labeled data is scarce. However, despite these empirical studies, the theoretical underpinning of when and how spectral regularization enables improved generalization is poorly understood. In this paper, we focus on learning pseudo-Boolean functions and demonstrate that regularizing the empirical mean squared error by the L1 norm of the spectral transform of the learned function reshapes the loss landscape and allows for data-frugal learning, under a restricted secant condition on the learner's empirical error measured against the ground truth function. Under a weaker quadratic growth condition, we show that stationary points which also approximately interpolate the training data points achieve statistically optimal generalization performance. Complementing our theory, we empirically demonstrate that running gradient descent on the regularized loss results in a better generalization performance compared to baseline algorithms in several data-scarce real-world problems.

URL: https://openreview.net/forum?id=mySiFHCeAl

---

Title: Tight Accounting in the Shuffle Model of Differential Privacy

Abstract: Shuffle model of differential privacy is a novel distributed privacy model based on a combination of local privacy mechanisms and a secure shuffler. It has been shown that the additional randomisation provided by the shuffler improves privacy bounds compared to the purely local mechanisms. Accounting tight bounds, however, is complicated by the complexity brought by the shuffler. The recently proposed numerical techniques for evaluating $(\varepsilon,\delta)$-differential privacy guarantees have been shown to give tighter bounds than commonly used methods for compositions of various complex mechanisms. In this paper, we show how to utilise these numerical accountants for adaptive compositions of general $\varepsilon$-LDP shufflers and for shufflers of $k$-randomised response mechanisms, including their subsampled variants. This is enabled by an approximation that speeds up the evaluation of the corresponding privacy loss distribution from $\mathcal{O}(n^2)$ to $\mathcal{O}(n)$, where $n$ is the number of users, without noticeable change in the resulting $\delta(\varepsilon)$-upper bounds. We also demonstrate looseness of the existing bounds and methods found in the literature, improving previous composition results for shufflers significantly.

URL: https://openreview.net/forum?id=11osftjEbF

---

Title: Accelerated Quality-Diversity through Massive Parallelism

Abstract: Quality-Diversity (QD) optimization algorithms are a well-known approach to generate large collections of diverse and high-quality solutions. However, derived from evolutionary computation, QD algorithms are population-based methods which are known to be data-inefficient and requires large amounts of computational resources. This makes QD algorithms slow when used in applications where solution evaluations are computationally costly. A common approach to speed up QD algorithms is to evaluate solutions in parallel, for instance by using physical simulators in robotics. Yet, this approach is limited to several dozen of parallel evaluations as most physics simulators can only be parallelized more with a greater number of CPUs. With recent advances in simulators that run on accelerators, thousands of evaluations can now be performed in parallel on single GPU/TPU. In this paper, we present QDax, an accelerated implementation of MAP-Elites which leverages massive parallelism on accelerators to make QD algorithms more accessible. We show that QD algorithms are ideal candidates to take advantage of progress in hardware acceleration. We demonstrate that QD algorithms can scale with massive parallelism to be run at interactive timescales without any significant effect on the performance. Results across standard optimization functions and four neuroevolution benchmark environments shows that experiment runtimes are reduced by two factors of magnitudes, turning days of computation into minutes. More surprising, we observe that reducing the number of generations by two orders of magnitude, and thus having significantly shorter lineage does not impact the performance of QD algorithms. These results show that QD can now benefit from hardware acceleration, which contributed significantly to the bloom of deep learning.

URL: https://openreview.net/forum?id=znNITCJyTI

---

Reply all
Reply to author
Forward
0 new messages