Daily TMLR digest for Mar 18, 2023

1 view
Skip to first unread message

TMLR

unread,
Mar 17, 2023, 8:00:10 PM3/17/23
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: Quantum Policy Iteration via Amplitude Estimation and Grover Search – Towards Quantum Advantage for Reinforcement Learning

Authors: Simon Wiedemann, Daniel Hein, Steffen Udluft, Christian B. Mendl

Abstract: We present a full implementation and simulation of a novel quantum reinforcement learning method. Our work is a detailed and formal proof of concept for how quantum algorithms can be used to solve reinforcement learning problems and shows that, given access to error- free, efficient quantum realizations of the agent and environment, quantum methods can yield provable improvements over classical Monte-Carlo based methods in terms of sample complexity. Our approach shows in detail how to combine amplitude estimation and Grover search into a policy evaluation and improvement scheme. We first develop quantum policy evaluation (QPE) which is quadratically more efficient compared to an analogous classi- cal Monte Carlo estimation and is based on a quantum mechanical realization of a finite Markov decision process (MDP). Building on QPE, we derive a quantum policy iteration that repeatedly improves an initial policy using Grover search until the optimum is reached. Finally, we present an implementation of our algorithm for a two-armed bandit MDP which we then simulate.

URL: https://openreview.net/forum?id=HG11PAmwQ6

---


New submissions
===============


Title: Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Abstract: In this paper, we address the following problem: Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs. We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset. Its cumulative Bayesian regret goes down to zero exponentially fast in $N$, the offline dataset size if the expert is competent enough. Since this algorithm is computationally impractical, we then propose the iRLSVI algorithm that can be seen as a combination of the RLSVI algorithm for online RL, and imitation learning. Our empirical results show that the proposed iRLSVI algorithm is able to achieve significant reduction in regret as compared to two baselines: no offline data, and offline dataset but used without information about the generative policy.
Our algorithm bridges online RL and imitation learning for the first time.

URL: https://openreview.net/forum?id=lanGfX0M6C

---

Title: Distributed SGD in overparameterized Linear Regression

Abstract: We consider distributed learning using constant stepsize SGD over several devices, each sending a final model update to a central server. In a final step, the local estimates are aggregated. We prove in the setting of overparameterized linear regression general upper bounds with matching lower bounds and derive learning rates for specific data generating
distributions. We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size.

We further compare distributed SGD with distributed ridge regression and provide an upper bound of the excess SGD-risk in terms of the excess RR-risk for a certain range of the sample
size.

URL: https://openreview.net/forum?id=sfrcfYMOnZ

---

Title: On the Convergence and Calibration of Deep Learning with Differential Privacy

Abstract: Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence (and thus lower accuracy), as well as more severe mis-calibration than its non-private counterpart. To analyze the convergence of DP training, we formulate a continuous time analysis through the lens of neural tangent kernel (NTK), which characterizes the per-sample gradient clipping and the noise addition in DP training, for arbitrary network architectures and loss functions. Interestingly, we show that the noise addition only affects the privacy risk but not the convergence or calibration, whereas the per-sample gradient clipping (under both flat and layerwise clipping styles) only affects the convergence and calibration.

Furthermore, we observe that while DP models trained with small clipping norm usually achieve the best accurate, but are poorly calibrated and thus unreliable. In sharp contrast, DP models trained with large clipping norm enjoy the same privacy guarantee and similar accuracy, but are significantly more \textit{calibrated}.

URL: https://openreview.net/forum?id=K0CAGgjYS1

---

Title: On Intriguing Layer-Wise Properties of Robust Overfitting in Adversarial Training

Abstract: Adversarial training has proven to be one of the most effective methods to defend against
adversarial attacks. Nevertheless, robust overfitting is a common obstacle in adversarial
training of deep networks. There is a common belief that the features learned by different
network layers have different properties, however, existing works generally investigate robust
overfitting by considering a DNN as a single unit and hence the impact of different network
layers on robust overfitting remains unclear. In this work, we divide a DNN into a series of
layers and investigate the effect of different network layers on robust overfitting. We find
that different layers exhibit distinct properties towards robust overfitting, and in particular,
robust overfitting is mostly related to the optimization of latter parts of the network. Based
upon the observed effect, we propose a robust adversarial training (RAT) prototype: in
a minibatch, we optimize the front parts of the network as usual, and adopt additional
measures to regularize the optimization of the latter parts. Based on the prototype, we
designed two realizations of RAT, and extensive experiments demonstrate that RAT can
eliminate robust overfitting and boost adversarial robustness over the standard adversarial
training

URL: https://openreview.net/forum?id=BaoCnmosJz

---

Reply all
Reply to author
Forward
0 new messages