Daily TMLR digest for May 31, 2024

0 views
Skip to first unread message

TMLR

unread,
May 31, 2024, 12:00:07 AMMay 31
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: Where Did the Gap Go? Reassessing the Long-Range Graph Benchmark

Authors: Jan Tönshoff, Martin Ritzert, Eran Rosenbluth, Martin Grohe

Abstract: The recent Long-Range Graph Benchmark (LRGB, Dwivedi et al. 2022) introduced a set of graph learning tasks strongly dependent on long-range interaction between vertices. Empirical evidence suggests that on these tasks Graph Transformers significantly outperform Message Passing GNNs (MPGNNs). In this paper, we carefully reevaluate multiple MPGNN baselines as well as the Graph Transformer GPS (Rampášek et al. 2022) on LRGB. Through a rigorous empirical analysis, we demonstrate that the reported performance gap is overestimated due to suboptimal hyperparameter choices. It is noteworthy that across multiple datasets the performance gap completely vanishes after basic hyperparameter optimization. In addition, we discuss the impact of lacking feature normalization for LRGB's vision datasets and highlight a spurious implementation of LRGB's link prediction metric. The principal aim of our paper is to establish a higher standard of empirical rigor within the graph machine learning community.

URL: https://openreview.net/forum?id=Nm0WX86sKv

---


New submissions
===============


Title: GANDALF: Gated Adaptive Network for Deep Automated Learning of Features for Tabular Data

Abstract: We propose a novel high-performance, interpretable, and parameter \& computationally efficient deep learning architecture for tabular data, Gated Adaptive Network for Deep Automated Learning of Features (GANDALF). GANDALF relies on a new tabular processing unit with a gating mechanism and in-built feature selection called Gated Feature Learning Unit (GFLU) as a feature representation learning unit. We demonstrate that GANDALF outperforms or stays at-par with SOTA approaches like XGBoost, SAINT, FT-Transformers, etc. by experiments on multiple established public benchmarks. We have made available the code under MIT License.

URL: https://openreview.net/forum?id=syMYVrQcbd

---

Title: Counterfactual Learning of Stochastic Policies with Continuous Actions

Abstract: Counterfactual reasoning from logged data has become increasingly important for many
applications such as web advertising or healthcare. In this paper, we address the problem of
learning stochastic policies with continuous actions from the viewpoint of counterfactual risk
minimization (CRM). While the CRM framework is appealing and well studied for discrete
actions, the continuous action case raises new challenges about modelization, optimization,
and offline model selection with real data which turns out to be particularly challenging. Our
paper contributes to these three aspects of the CRM estimation pipeline. First, we introduce
a modelling strategy based on a joint kernel embedding of contexts and actions, which
overcomes the shortcomings of previous discretization approaches. Second, we empirically
show that the optimization aspect of counterfactual learning is important, and we demonstrate
the benefits of proximal point algorithms and smooth estimators. Finally, we propose an
evaluation protocol for offline policies in real-world logged systems, which is challenging since
policies cannot be replayed on test data, and we release a new large-scale dataset along with
multiple synthetic, yet realistic, evaluation setups.

URL: https://openreview.net/forum?id=fC4bh1PmZr

---

Title: Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

Abstract: One of the most intriguing findings in the structure of neural network landscapes is the phenomenon of mode connectivity: For two typical global minima, there exists a path connecting them without barrier. This concept of mode connectivity has played a crucial role in understanding important phenomena in deep learning.

In this paper, we conduct a fine-grained analysis of this connectivity phenomenon. First, we demonstrate that in the overparameterized case, the connecting path can be as simple as a two-piece linear path, and the path length can be nearly equal to the Euclidean distance. This finding suggests that the landscape should be nearly convex in a certain sense. Second, we uncover a surprising star-shaped connectivity: For a finite number of typical minima, there exists a center on the minima manifold that connects all of them simultaneously via linear paths. These results are provably valid for linear networks and two-layer ReLU networks under a teacher-student setup, and are empirically supported by models trained on MNIST and CIFAR-10.

URL: https://openreview.net/forum?id=V8lv3u5UAX

---

Title: Considerations for Distribution Shift Robustness of Diagnostic Models in Healthcare

Abstract: We consider robustness to distribution shifts in the context of diagnostic models in healthcare, where the prediction target Y , e.g., the presence of a disease, is causally upstream of the observations X, e.g., a biomarker. Distribution shifts may occur, for instance, when the training data is collected in a domain with patients having particular demographic characteristics while the model is deployed on patients from a different demographic group. In the domain of applied ML for health, it is common to predict Y from X without considering further information about the patient. However, beyond the direct influence of the disease Y on biomarker X, a predictive model may learn to exploit confounding dependencies (or shortcuts) between X and Y that are unstable under certain distribution shifts. In this work, we highlight a data generating mechanism common to healthcare settings and discuss how recent theoretical results from the causality literature can be applied to build robust predictive models. We theoretically show why ignoring covariates as well as common invariant learning approaches will in general not yield robust predictors in the studied setting, while including certain covariates into the prediction model will. In an extensive simulation study, we showcase the robustness (or lack thereof) of different predictors under various data generating processes. Lastly, we analyze the performance of the different approaches using the PTB-XL dataset, a public dataset of annotated ECG recordings.

URL: https://openreview.net/forum?id=B7TLNCBnlA

---

Reply all
Reply to author
Forward
0 new messages