Daily TMLR digest for Aug 11, 2022

0 views

Skip to first unread message

TMLR

unread,

Aug 10, 2022, 8:00:10 PM8/10/22

to tmlr-anno...@googlegroups.com

New submissions
===============

Title: On a continuous time model of gradient descent dynamics and instability in deep learning

Abstract: The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.

URL: https://openreview.net/forum?id=EYrRzKPinA

---

Title: Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems

Abstract: Graph neural networks are often used to model interacting dynamical systems since they
gracefully scale to systems with a varying and high number of agents. While there has been
much progress made for deterministic interacting systems, modeling is much more challenging
for stochastic systems in which one is interested in obtaining a predictive distribution
over future trajectories. Existing methods are either computationally slow since they rely
on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution
is unimodal. In this work, we present a deep state-space model which employs graph
neural networks in order to model the underlying interacting dynamical system. The predictive
distribution is multimodal and has the form of a Gaussian mixture model, where the
moments of the Gaussian components can be computed via deterministic moment matching
rules. Our moment matching scheme can be exploited for sample-free inference leading to
more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we
propose structured approximations to the covariance matrices of the Gaussian components
in order to scale up to systems with many agents. We benchmark our novel framework
on two challenging autonomous driving datasets. Both confirm the benefits of our method
compared to state-of-the-art methods. We further demonstrate the usefulness of our individual
contributions in a carefully designed ablation study and provide a detailed empirical
runtime analysis of our proposed covariance approximations.

URL: https://openreview.net/forum?id=dqgdBy4Uv5

---

Title: Incorporating Sum Constraints into Multitask Gaussian Processes

Abstract: Machine learning models can be improved by adapting them to respect existing background knowledge. In this paper we consider multitask Gaussian processes, with background knowledge in the form of constraints that require a specific sum of the outputs to be constant. This is achieved by conditioning the prior distribution on the constraint fulfillment. The approach allows for both linear and nonlinear constraints. We demonstrate that the constraints are fulfilled with high precision and that the construction can improve the overall prediction accuracy as compared to the standard Gaussian process.

URL: https://openreview.net/forum?id=gzu4ZbBY7S

---

Title: Unifying Approaches in Data Subset Selection via Fisher Information and Information-Theoretic Quantities

Abstract: The mutual information between predictions and model parameters---also referred to as expected information gain or BALD in machine learning---measures informativeness. It is a popular acquisition function in Bayesian active learning. In data subset selection, that is,
active learning and active sampling, several recent works use Fisher information, Hessians, similarity matrices based on the gradients, or simply the gradient lengths to compute the acquisition scores that guide sample selection. Are these different approaches connected,
and if so, how? In this paper, we revisit the Fisher information and use it to show how several otherwise disparate methods are connected as approximations of information-theoretic quantities known from earlier works in Bayesian optimal experiment design.

URL: https://openreview.net/forum?id=UVDAKQANOW

---

Title: On Connecting Deep Trigonometric Networks with Deep Gaussian Processes: Covariance, Expressivity, and Neural Tangent Kernel

Abstract: Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressive power of function composition. DGPs also offer diverse modeling capabil- ities, but inference is challenging because marginalization in latent function space is not tractable. With Bochner’s theorem, DGP with squared exponential kernel can be viewed as a deep trigonometric network consisting of the random feature layers, sine and cosine acti- vation units, and random weight layers. In the wide limit with a bottleneck, we show that the weight space view yields the same effective covariance functions which were obtained previously in function space. Also, varying the prior distributions over network parameters is equivalent to employing different kernels. As such, DGPs can be translated into the deep bottlenecked trigonometric networks, with which the exact maximum a posteriori estimate can be obtained. Interestingly, the network representation enables the study of DGP’s neu- ral tangent kernel, which may also reveal the mean of the intractable predictive distribution. Statistically, unlike the shallow networks, deep networks of finite width have covariance de- viating from the limiting kernel, and the inner and outer widths may play different roles in feature learning. Numerical simulations are presented to support our findings.

URL: https://openreview.net/forum?id=DmjBJtCIKu

---

Title: Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning

Abstract: Multi-Task Learning (MTL) has achieved success in various fields. However, how to balance different tasks to achieve good performance is a key problem. To achieve task balancing, there are many works to carefully design dynamical loss/gradient weighting strategies but the basic random experiments are ignored to examine their effectiveness. In this paper, we propose the Random Weighting (RW) methods, including Random Loss Weighting (RLW) and Random Gradient Weighting (RGW), where an MTL model is trained with random loss/gradient weights sampled from a distribution. To show the effectiveness and necessity of RW methods, theoretically we analyze the convergence of RW and reveal that RW has a higher probability to escape local minima, resulting in better generalization ability. Empirically, we extensively evaluate the proposed RW methods to compare with twelve state-of-the-art methods on five image datasets and two multilingual problems from the XTREME benchmark to show RW methods can achieve comparable performance with state-of-the-art baselines. Therefore, we think that the RW methods are important baselines for MTL and should attract more attention.

URL: https://openreview.net/forum?id=jjtFD8A1Wx

---

Title: Active Learning of Ordinal Embeddings: A User Study on Football Data

Abstract: Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. Distance metrics can only serve as proxy for similarity in information retrieval of similar instances. Learning a good similarity function from human annotations improves the quality of retrievals. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
We adapt an entropy-based active learning method with recent work from triplet mining to collect easy-to-answer but still informative annotations from human participants and use them to train a deep convolutional network that generalizes to unseen samples.
Our user study shows that our approach improves the quality of the information retrieval compared to a previous deep metric learning approach that relies on a Siamese network. Specifically, we shed light on the strengths and weaknesses of passive sampling heuristics and active learners alike by analyzing the participants' response efficacy. To this end, we collect accuracy, algorithmic time complexity, the participants' fatigue and time-to-response, qualitative self-assessment and statements, as well as the effects of mixed-expertise annotators and their consistency on model performance and transfer-learning.

URL: https://openreview.net/forum?id=oq3tx5kinu

---

Reply all

Reply to author

Forward

0 new messages