Daily TMLR digest for Mar 01, 2023

1 view

Skip to first unread message

TMLR

unread,

Feb 28, 2023, 7:00:15 PM2/28/23

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: ViViT: Curvature Access Through The Generalized Gauss-Newton’s Low-Rank Structure

Authors: Felix Dangel, Lukas Tatzel, Philipp Hennig

Abstract: Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication via automatic differentiation or Kronecker-factored block diagonal approximations do not consider noise in the mini-batch. We present ViViT, a curvature model that leverages the GGN’s low-rank structure without further approximations. It allows for efficient computation of eigenvalues, eigenvectors, as well as per-sample first- and second-order directional derivatives. The representation is computed in parallel with gradients in one backward pass and offers a fine-grained cost-accuracy trade-off, which allows it to scale. We demonstrate this by conducting performance benchmarks and substantiate ViViT’s usefulness by studying the impact of noise on the GGN’s structural properties during neural network training.

URL: https://openreview.net/forum?id=DzJ7JfPXkE

---

Title: Online Learning for Prediction via Covariance Fitting: Computation, Performance and Robustness

Authors: Muhammad Osama, Dave Zachariah, Peter Stoica, Thomas B. Schön

Abstract: We consider the problem of online prediction using linear smoothers that are functions of a nominal covariance model with unknown parameters. The model parameters are often learned using cross-validation or maximum-likelihood techniques. But when training data arrives in a streaming fashion, the implementation of such techniques can only be done in an approximate manner. Even if this limitation could be overcome, there appears to be no clear-cut results on the statistical properties of the resulting predictor.

Here we consider a covariance-fitting method to learn the model parameters, which was initially developed for spectral estimation. We first show that the use of this approach results in a computationally efficient online learning method in which the resulting predictor can be updated sequentially. We then prove that, with high probability, its out-of-sample error approaches the optimal level at a root-$n$ rate, where $n$ is the number of data samples. This is so even if the nominal covariance model is misspecified. Moreover, we show that the resulting predictor enjoys two robustness properties. First, it corresponds to a predictor that minimizes the out-of-sample error with respect to the least favourable distribution within a given Wasserstein distance from the empirical distribution. Second, it is robust against errors in the covariate training data. We illustrate the performance of the proposed method in a numerical experiment.

URL: https://openreview.net/forum?id=nAr9PhyEbQ

---

Title: Calibrate and Debias Layer-wise Sampling for Graph Convolutional Networks

Authors: Yifan Chen, Tianning Xu, Dilek Hakkani-Tur, Di Jin, Yun Yang, Ruoqing Zhu

Abstract: Multiple sampling-based methods have been developed for approximating and accelerating node embedding aggregation in graph convolutional networks (GCNs) training. Among them, a layer-wise approach recursively performs importance sampling to select neighbors jointly for existing nodes in each layer. This paper revisits the approach from a matrix approximation perspective, and identifies two issues in the existing layer-wise sampling methods: suboptimal sampling probabilities and estimation biases induced by sampling without replacement. To address these issues, we accordingly propose two remedies: a new principle for constructing sampling probabilities and an efficient debiasing algorithm. The improvements are demonstrated by extensive analyses of estimation variance and experiments on common benchmarks. Code and algorithm implementations are publicly available at \url{https://github.com/ychen-stat-ml/GCN-layer-wise-sampling}.

URL: https://openreview.net/forum?id=JyKNuoZGux

---

Title: Revisiting adversarial training for the worst-performing class

Authors: Thomas Pethick, Grigorios Chrysos, Volkan Cevher

Abstract: Despite progress in adversarial training (AT), there is a substantial gap between the top-performing and worst-performing classes in many datasets. For example, on CIFAR10, the accuracies for the best and worst classes are 74% and 23%, respectively. We argue that this gap can be reduced by explicitly optimizing for the worst-performing class, resulting in a min-max-max optimization formulation. Our method, called class focused online learning (CFOL), includes high probability convergence guarantees for the worst class loss and can be easily integrated into existing training setups with minimal computational overhead. We demonstrate an improvement to 32% in the worst class accuracy on CIFAR10, and we observe consistent behavior across CIFAR100 and STL10. Our study highlights the importance of moving beyond average accuracy, which is particularly important in safety-critical applications.

URL: https://openreview.net/forum?id=wkecshlYxI

---

New submissions
===============

Title: Off-Policy Evaluation with Out-of-Sample Guarantees

Abstract: We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy - including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.

URL: https://openreview.net/forum?id=XnYtGPgG9p

---

Reply all

Reply to author

Forward

0 new messages