Daily TMLR digest for Jul 20, 2022

0 views

Skip to first unread message

TMLR

unread,

Jul 19, 2022, 8:00:14 PM7/19/22

to tmlr-anno...@googlegroups.com

New submissions
===============

Title: Faking Interpolation Until You Make It

Abstract: Deep over-parameterized neural networks exhibit the interpolation property on many data sets. Specifically, these models can achieve approximately zero loss on all training samples simultaneously. This property has been exploited to develop optimisation algorithms for this setting. These algorithms use the fact that the optimal loss value is known to employ a variation of a Polyak step size calculated on each stochastic batch of data. We introduce a novel extension of this idea to tasks where the interpolation property does not hold. As we no longer have access to the optimal loss values a priori, we instead estimate them for each sample online. To realise this, we introduce a simple but highly effective heuristic for approximating the optimal value based on previous loss evaluations. We provide rigorous experimentation on a range of problems. From our empirical analysis we demonstrate the effectiveness of our approach, which outperforms other single hyperparameter optimisation methods.

URL: https://openreview.net/forum?id=OslAMMF4ZP

---

Title: Target Propagation via Regularized Inversion

Abstract: Target Propagation (TP) algorithms compute targets instead of gradients along neural networks and propagate them backward in a way that is similar yet different than gradient back-propagation (BP). The idea was first presented as a perturbative alternative to BP that may improve gradient evaluation accuracy when training multi-layer neural networks (LeCun, 1985) and has gained popularity as a biologically plausible counterpart of BP. However, TP may have remained more of a template algorithm with many variations than a well-identified algorithm. Revisiting insights of LeCun (1985); Lee et al (2015), we present a simple version of TP based on regularized inversions of network layers that sheds light on the relevance of TP from an optimization viewpoint and is easily implementable in a differentiable programming framework. We show how TP can be used to train recurrent neural networks with long sequences on various sequence modeling problems and delineate theoretically and empirically the regimes in which the computational complexity of TP can be attractive compared to BP.

URL: https://openreview.net/forum?id=vxyjTUPV24

---

Reply all

Reply to author

Forward

0 new messages