Daily TMLR digest for Jun 23, 2024

0 views

Skip to first unread message

TMLR

unread,

Jun 23, 2024, 12:00:33 AM (8 days ago) Jun 23

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Koopman Spectrum Nonlinear Regulators and Efficient Online Learning

Authors: Motoya Ohnishi, Isao Ishikawa, Kendall Lowrey, Masahiro Ikeda, Sham M. Kakade, Yoshinobu Kawahara

Abstract: Most modern reinforcement learning algorithms optimize a cumulative single-step cost along a trajectory. The optimized motions are often ‘unnatural’, representing, for example, behaviors with sudden accelerations that waste energy and lack predictability. In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics. This induces a broader class of dynamical behaviors that evolve over stable manifolds such as nonlinear oscillators, closed loops, and smooth movements. We demonstrate that some dynamics characterizations that are not possible with a cumulative cost are feasible in this paradigm, which generalizes the classical eigenstructure and pole assignments to nonlinear decision making. Moreover, we present a sample efficient online learning algorithm for our problem that enjoys a sub-linear regret bound under some structural assumptions.

URL: https://openreview.net/forum?id=thfoUZugvS

---

Title: Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

Abstract: Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse %
is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and most existing work uses standard dense schedules and hyperparameters for training sparse networks. In this work, we examine the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We begin by showing that using standard dense training recipes for sparse training is suboptimal, and provide evidence that this results in *under-training*, loosely defined as using a suboptimal number of passes over the training data. We present training recipes for mitigating this issue for both sparse pre-training of vision models (e.g. ResNet50/ImageNet) and sparse fine-tuning of language models (e.g. BERT/GLUE), achieving state-of-the-art results in both settings in the high-sparsity regime, and providing detailed analyses for the difficulty of sparse training in both scenarios. Our work sets a new benchmark in terms of the accuracies that can be achieved under high sparsity, and should inspire further research into improving sparse model training, to reach higher accuracies under high sparsity, but also to do so efficiently.

URL: https://openreview.net/forum?id=vgthYeRBAF

---

New submissions
===============

Title: Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

Abstract: Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent research has proposed proxy-based active learning, which pre-computes features to reduce computational costs. Yet, this approach often incurs a significant loss in active learning performance, sometimes outweighing the computational cost savings. This paper demonstrates that not all sample selection differences result in performance degradation. Furthermore, we show that suitable training methods can mitigate the decline of active learning performance caused by certain selection discrepancies. Building upon detailed analysis, we propose a novel method, aligned selection via proxy, which improves proxy-based active learning performance by updating pre-computed features and selecting a proper training method. Extensive experiments validate that our method improves the total cost of efficient active learning while maintaining computational efficiency.

URL: https://openreview.net/forum?id=PNcgJMJcdl

---

Title: Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift

Abstract: We consider learning discriminative representations of variables related to each other via a causal graph. To learn representations that are robust against interventional distribution shifts, the training dataset is augmented with interventional data in addition to existing observational data. However, even when the underlying causal model is known, existing approaches treat interventional data like observational data, ignoring the independence relations resulting from these interventions. This leads to representations that exhibit large disparities in predictive performance on observational and interventional data. The performance disparity worsens when the quantity of interventional data available for training is limited. In this paper, (1) we first identify a strong correlation between this performance disparity and adherence of the representations to the statistical independence conditions induced by the underlying causal model during interventions. (2) For linear models, we derive sufficient conditions on the proportion of interventional data during training, for which enforcing statistical independence between representations corresponding to the intervened node and its non-descendants during interventions can lower the test-time error on interventional data. Following these insights, we propose RepLIn, an algorithm to explicitly enforce this statistical independence during interventions. We demonstrate the utility of RepLIn on synthetic and real face image datasets. Our experiments show that RepLIn is scalable with the number of nodes in the causal graph and is suitable to improve the robustness of representations against interventional distribution shifts of both continuous and discrete latent variables compared to the ERM baselines.

URL: https://openreview.net/forum?id=pZRanZlab4

---

Reply all

Reply to author

Forward

0 new messages