Daily TMLR digest for Oct 30, 2022

1 view

Skip to first unread message

TMLR

unread,

Oct 29, 2022, 8:00:07 PM10/29/22

to tmlr-anno...@googlegroups.com

New submissions
===============

Title: ViViT: Curvature Access Through The Generalized Gauss-Newton’s Low-Rank Structure

Abstract: Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication via automatic differentiation or Kronecker-factored block diagonal approximations do not consider noise in the mini-batch. We present ViViT, a curvature model that leverages the GGN’s low-rank structure without further approximations. It allows for efficient computation of eigenvalues, eigenvectors, as well as per-sample first- and second-order directional derivatives. The representation is computed in parallel with gradients in one backward pass and offers a fine-grained cost-accuracy trade-off, which allows it to scale. We demonstrate this by conducting performance benchmarks and substantiate ViViT’s usefulness by studying the impact of noise on the GGN’s structural properties during neural network training.

URL: https://openreview.net/forum?id=DzJ7JfPXkE

---

Title: Stacking Diverse Architectures to Improve Machine Translation

Abstract: Repeated applications of the same neural block primarily based on self-attention characterize the current state-of-the-art in neural architectures for machine translation. In such architectures the decoder adopts a masked version of the same encoding block. Although simple this strategy doesn't encode the various inductive biases such as locality that arise from alternative architectures and that are central to the modelling of translation. We propose Lasagna, an encoder-decoder model that aims to combine the inductive benefits of different architectures by layering multiple instances of different blocks. Lasagna’s encoder first grows the representation from local to mid-sized using convolutional blocks and only then applies a pair of final self-attention blocks. Lasagna’s decoder uses only convolutional blocks that attend to the encoder representation. On a large suit of machine translation tasks, we find that Lasagna not only matches or outperforms the Transformer baseline, but it does so more efficiently thanks to widespread use of the efficient convolutional blocks. These findings suggest that the widespread use of uniform architectures may be suboptimal in certain scenarios and exploiting the diversity of inductive architectural biases can lead to substantial gains.

URL: https://openreview.net/forum?id=mNEqiC924B

---

Title: The Low-Rank Simplicity Bias in Deep Networks

Abstract: Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data? In this work, we make a series of
empirical observations that investigate and extend the hypothesis that deeper networks are inductively biased to find solutions with lower effective rank embeddings. We conjecture that this bias exists because the volume of functions that maps to low effective rank embedding
increases with depth. We show empirically that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well. We then show that the simplicity bias exists
at both initialization and after training and is resilient to hyper-parameters and learning methods. We further demonstrate how linear over-parameterization of deep non-linear models can be used to induce low-rank bias, improving generalization performance on CIFAR and
ImageNet without changing the modeling capacity.

URL: https://openreview.net/forum?id=bCiNWDmlY2

---

Title: Evaluating the Evaluators: Which UDA validation methods are most effective? Can they be improved?

Abstract: This paper compares and ranks 8 UDA validation methods. Validators estimate model accuracy, which makes them an essential component of any UDA train-test pipeline. We rank these validators to indicate which of them are most useful for the purpose of selecting optimal model checkpoints and hyperparameters. To the best of our knowledge, this large-scale benchmark study is the first of its kind in the UDA field. In addition, we propose 3 new validators that outperform existing validators. When paired with one particular UDA algorithm, one of our new validators achieves state-of-the-art performance.

URL: https://openreview.net/forum?id=1oJp1R9PSJ

---

Reply all

Reply to author

Forward

0 new messages