Daily TMLR digest for Jun 01, 2024

0 views

Skip to first unread message

TMLR

unread,

Jun 1, 2024, 12:00:08 AMJun 1

to tmlr-anno...@googlegroups.com

New certifications
==================

Reproducibility Certification: 'Explaining RL Decisions with Trajectories’: A Reproducibility Study

Karim Ahmed Abdel Sadek, Matteo Nulli, Joan Velja, Jort Vincenti

https://openreview.net/forum?id=QdeBbK5CSh

---

Accepted papers
===============

Title: 'Explaining RL Decisions with Trajectories’: A Reproducibility Study

Authors: Karim Ahmed Abdel Sadek, Matteo Nulli, Joan Velja, Jort Vincenti

Abstract: This work investigates the reproducibility of the paper "Explaining RL decisions with trajectories“ by Deshmukh et al. (2023). The original paper introduces a novel approach in explainable reinforcement learning based on the attribution decisions of an agent to specific clusters of trajectories encountered during training. We verify the main claims from the paper, which state that (i) training on less trajectories induces a lower initial state value, (ii) trajectories in a cluster present similar high-level patterns, (iii) distant trajectories influence the decision of an agent, and (iv) humans correctly identify the attributed trajectories to the decision of the agent. We recover the environments used by the authors based on the partial original code they provided for one of the environments (Grid-World), and implemented the remaining from scratch (Seaquest and HalfCheetah, Breakout, Q*Bert).
While we confirm that (i), (ii), and (iii) partially hold, we extend on the largely qualitative experiments from the authors by introducing a quantitative metric to further support (iii), and new experiments and visual results for (i). Moreover, we investigate the use of different clustering algorithms and encoder architectures to further support (ii). We could not support (iv), given the limited extent of the original experiments. We conclude that, while some of the claims can be supported, further investigations and experiments could be of interest. We recognize the novelty of the work from the authors and hope that our work paves the way for clearer and more transparent approaches.

URL: https://openreview.net/forum?id=QdeBbK5CSh

---

Title: Online Tensor Max-Norm Regularization via Stochastic Optimization

Authors: Tong Wu

Abstract: The advent of ubiquitous multidimensional arrays poses unique challenges for low-rank modeling of tensor data due to higher-order relationships, gross noise, and large dimensions of the tensor. In this paper, we consider online low-rank estimation of tensor data where the multidimensional data are revealed sequentially. Induced by the recently proposed tensor-tensor product (t-product), we rigorously deduce the tensor max-norm and formulate the tensor max-norm into an equivalent tensor factorization form, where the factors consist of a tensor basis component and a coefficient one. With this formulation, we develop an online max-norm regularized tensor decomposition (OMRTD) method by alternatively optimizing over the basis component and the coefficient tensor. The algorithm is scalable to the large-scale setting and the sequence of the solutions produced by OMRTD converges to a stationary point of the expected loss function asymptotically. Further, we extend OMRTD for tensor completion. Numerical experiments demonstrate encouraging results for the effectiveness and robustness of our algorithm. The code is available at https://github.com/twugithub/2024-TMLR-OMRTD.

URL: https://openreview.net/forum?id=1iDpP3GWmS

---

Title: A Study of the Effects of Transfer Learning on Adversarial Robustness

Authors: Pratik Vaishnavi, Kevin Eykholt, Amir Rahmati

Abstract: The security and robustness of AI systems are paramount in real-world applications. Previous research has focused on developing methods to train robust networks, assuming the availability of sufficient labeled training data. However, in deployment scenarios with limited training data, existing techniques for training robust networks become impractical. In such low-data scenarios, non-robust training methods often resort to transfer learning. This involves pre-training a network on a large, possibly labeled dataset and fine-tuning it for a new task with a limited set of training samples. The efficacy of transfer learning in enhancing adversarial robustness is not comprehensively explored. Specifically, it remains uncertain whether transfer learning can improve adversarial performance in low-data scenarios. Furthermore, the potential benefits of transfer learning for certified robustness are unexplored. In this paper, we conduct an extensive analysis of the impact of transfer learning on both empirical and certified adversarial robustness. Employing supervised and self-supervised pre-training methods and fine-tuning across 12 downstream tasks representing diverse data availability scenarios, we identify the conditions conducive to training adversarially robust models through transfer learning. Our study reveals that the effectiveness of transfer learning in improving adversarial robustness is attributed to an increase in standard accuracy and not the direct ``transfer'' of robustness from the source to the target task, contrary to previous beliefs. Our findings provide valuable insights for practitioners aiming to deploy robust ML models in their applications.

URL: https://openreview.net/forum?id=T6RygOFZ6B

---

Title: End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training

Authors: Keitaro Sakamoto, Issei Sato

Abstract: End-to-end (E2E) training, optimizing the entire model through error backpropagation, fundamentally supports the advancements of deep learning. Despite its high performance, E2E training faces the problems of memory consumption, parallel computing, and discrepancy with the functionalities of the actual brain. Various alternative methods have been proposed to overcome these difficulties; however, no one can yet match the performance of E2E training, thereby falling short in practicality. Furthermore, there is no deep understanding regarding differences in the trained model properties beyond the performance gap.
In this paper, we reconsider why E2E training demonstrates a superior performance through a comparison with layer-wise training, which shares fundamental learning principles and architectures with E2E training, with the granularity of loss evaluation being the only difference. On the basis of the observation that E2E training has an advantage in propagating input information, we analyze the information plane dynamics of intermediate representations based on the Hilbert-Schmidt independence criterion (HSIC). The results of our normalized HSIC value analysis reveal the E2E training ability to exhibit different information dynamics across layers, in addition to efficient information propagation. Furthermore, we show that this layer-role differentiation leads to the final representation following the information bottleneck principle. Our work not only provides the advantages of E2E training in terms of information propagation and the information bottleneck but also suggests the need to consider the cooperative interactions between layers, not just the final layer when analyzing the information bottleneck of deep learning.

URL: https://openreview.net/forum?id=O3wmRh2SfT

---

New submissions
===============

Title: Analyzing the Impact of Learnable Softmax Temperature in Contrastive Visual-Textual Alignment Systems: Benefits, Drawbacks, and Alternative Approaches

Abstract: This work does NOT read like “fabricate motivation - propose something - obtain sota results”. Instead, we give an analysis of the learnable softmax temperature parameter in the practical training of contrastive visual-textual alignment learning model (commonly referred to as the “CLIP” model). This parameter is considered to be imperative for optimal system performance, however, its working mechanism and possible drawbacks have long been neglected. This study addresses this problem as well as offers a novel solution by leveraging the structure of ViTs. Our argument centers around the pivotal role of the softmax temperature in handling noisy training data. We visualize that there exists an equilibrium in the gradient of the contrastive loss, while the temperature parameter serves as a distance scaling factor. Otherwise, the model has trouble aligning positive pairs due to a numerical problem in the loss term. On the contrary, we also show that a large temperature would result in possible unstable learning dynamics. Subsequently, we figured out alternative approaches that could mitigate the problem from a topological view of the contrastive loss. Finally, we capitalize on multiple class tokens embedded within the transformer architecture to offer a concise solution. This configuration significantly boosts zero-shot classification performance, enhancing baseline CLIP models pretrained on large-scale datasets by an average of 6.1%. The codes and learned weights are provided in https://github.com/{Anonymous_authors}.

URL: https://openreview.net/forum?id=rx1QNhsNsK

---

Title: Neural incomplete factorization: learning preconditioners for the conjugate gradient method

Abstract: The convergence of the conjugate gradient method to solve large-scale and sparse linear
equation systems depends on the conditioning of the system matrix, which can be improved by
preconditioning. In this paper, we develop a computationally efficient data-driven approach
to accelerate the generation of effective preconditioners. We, therefore, replace the typically
hand-engineered preconditioners by the output of graph neural networks. Optimizing the
condition number of the linear system directly is computationally infeasible. Instead, our
method generates an incomplete factorization of the matrix and is, therefore, referred to
as neural incomplete factorization (NeuralIF). For efficient training, we utilize a stochastic
approximation of the Frobenius loss which only requires matrix-vector multiplications. At
the core of our method is a novel message-passing block, inspired by sparse matrix theory,
that aligns with the objective of finding a sparse factorization of the matrix. We evaluate
our proposed method on both synthetic problem instances and on problems arising from the
discretization of the Poisson equation on varying domains. Our experiments show that by
utilizing data-driven preconditioners within the conjugate gradient method we are able to
speed up the convergence of the iterative procedure.

URL: https://openreview.net/forum?id=FozLrZ3CI5

---

Reply all

Reply to author

Forward

0 new messages