Daily TMLR digest for Dec 20, 2025

1 view

Skip to first unread message

TMLR

unread,

Dec 20, 2025, 12:30:07 AM12/20/25

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: GraphFM: A generalist graph transformer that learns transferable representations across diverse domains

Authors: Divyansha Lachi, Mehdi Azabou, Vinam Arora, Eva L Dyer

Abstract: Graph neural networks (GNNs) are often trained on individual datasets, requiring specialized models and significant hyperparameter tuning due to the unique structures and features of each dataset. This approach limits the scalability and generalizability of GNNs, as models must be tailored for each specific graph type. To address these challenges, we introduce GraphFM, a scalable multi-graph pretraining approach designed for learning across diverse graph datasets. GraphFM uses a Perceiver-based encoder with learned latent tokens to compress domain-specific features into a shared latent space, enabling generalization across graph domains. We propose new techniques for scaling up graph training on datasets of different sizes, allowing us to train GraphFM on 152 distinct graph datasets, containing a total of 7.4 million nodes and 189 million edges. This allows us to study the effect of scale on pretraining across domains such as molecules, citation networks, and product graphs, and show that training on diverse datasets improves performance over single-source pretraining. Additionally, pretraining with a mixture of synthetic and real graphs enhances adaptability and stability, leading to competitive performance with state-of-the-art models across various node classification tasks. This approach reduces the burden of dataset-specific training and provides a single generalist model capable of performing across multiple diverse graph structures and tasks. Code is available at https://github.com/nerdslab/GraphFM.

URL: https://openreview.net/forum?id=sZTpRfRUtR

---

Title: On the Hardness of Computing Counterfactual and Semi-factual Explanations in XAI

Authors: André Artelt, Martin Olsen, Kevin Tierney

Abstract: Providing clear explanations to the choices of machine learning models is essential for these models to be deployed in crucial applications. Counterfactual and semi-factual explanations have emerged as two mechanisms for providing users with insights into the outputs of their models. We provide an overview of the computational complexity results in the literature for generating these explanations, finding that in many cases, generating explanations is computationally hard. We strengthen the argument for this considerably by further contributing our own inapproximability results showing that not only are explanations often hard to generate, but under certain assumptions, they are also hard to approximate. We discuss the implications of these complexity results for the XAI community and for policymakers seeking to regulate explanations in AI.

URL: https://openreview.net/forum?id=aELzBw0q1O

---

Title: Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge

Authors: Dong-Sig Han, Jaein Kim, HEE BIN YOO, Byoung-Tak Zhang

Abstract: The Schrödinger bridge (SB) has evolved into a universal class of probabilistic generative models. In practice, however, estimated learning signals are innately uncertain, and the reliability promised by existing methods is often based on speculative optimal case scenarios. Recent studies regarding the Sinkhorn algorithm through mirror descent (MD) have gained attention, revealing geometric insights into solution acquisition of the SB problems. In this paper, we propose a variational online MD (OMD) framework for the SB problems, which provides further stability to SB solvers. We formally prove convergence and a regret bound for the novel OMD formulation of SB acquisition. As a result, we propose a simulation-free SB algorithm called Variational Mirrored Schrödinger Bridge (VMSB) by utilizing the Wasserstein-Fisher-Rao geometry of the Gaussian mixture parameterization for Schrödinger potentials. Based on the Wasserstein gradient flow theory, the algorithm offers tractable learning dynamics that precisely approximate each OMD step. In experiments, we validate the performance of the proposed VMSB algorithm across an extensive suite of benchmarks. VMSB consistently outperforms contemporary SB solvers on a wide range of SB problems, demonstrating the robustness as well as generality predicted by our OMD theory.

URL: https://openreview.net/forum?id=g3SsM9FLpm

---

Title: Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

Authors: Haodong Lu, Xinyu Zhang, Kristen Moore, Jason Xue, Lina Yao, Anton van den Hengel, Dong Gong

Abstract: Continual learning (CL) enables deep neural networks to acquire new knowledge over time while mitigating catastrophic forgetting of previously learned information. The powerful generalization ability of pre-trained models (PTMs), such as the Contrastive Language-Image Pre-training (CLIP) model, has inspired a range of CL methods targeting new and specialized tasks, further bridging the gap between PTMs and continual adaptation. Leveraging its multi-modal visual and textual representations, CLIP offers a natural paradigm for CL, where new tasks can be accommodated by incrementally learning lightweight parameters, particularly prompts. However, existing prompt-based CL methods for PTMs often rely on complex designs built upon specific assumptions, such as intricate regularization schemes for prompt pools, specialized routing mechanisms, or multi-stage incrementation processes. While these approaches improve performance, they frequently introduce additional-and possibly unnecessary-complexity, underutilizing CLIP's intrinsic capabilities. In this paper, we propose a concise CL approach for CLIP based on incremental prompt tuning that fully exploits its multi-modal structure and the stability of textual representations. Our method, Textual Prototype-guided Prompt Tuning (TPPT), introduces textual prototypes not merely as static classifiers, as in existing methods, but as stable anchors to guide the learning of visual prompts, thereby shaping the embedding space (i.e., TPPT-V). We show that our bidirectional supervision strategy enables more effective learning of new knowledge while reducing forgetting. To further close the vision-language gap during CL, we activate the language branch and extend our approach to jointly optimize both visual and textual prompts (i.e., TPPT-VT). We also introduce a relational diversity regularization on the textual anchors to prevent embedding space collapse and mitigate correlated forgetting. Extensive experiments and analyses demonstrate the effectiveness of our proposed approach, highlighting the benefits of leveraging CLIP's intrinsic guidance for continual adaptation.

URL: https://openreview.net/forum?id=YJnjkzKq5Y

---

New submissions
===============

Title: PSAG: Projection-based Stabilized Attribution Guidance for Online Continual Learning

Abstract: Online Continual Learning (OCL) aims to incrementally learn from non-stationary data streams in a one-pass setting, facing the dual challenges of catastrophic forgetting and insufficient training. These challenges intensify the stability-plasticity dilemma, where preserving old knowledge conflicts with acquiring new information. In this paper, we propose Projection-based Stabilized Attribution Guidance (PSAG), a modular framework that leverages gradient-based attributions as active guidance signals to selectively preserve task-relevant representations. Our framework consists of three complementary mechanisms: (1) Attribution-Guided Feature Modulation (AGFM) that anchors critical features in the representation space; (2) Importance-Aware Loss Reweighting (IALR) that prioritizes informative samples at the loss level; and (3) Manifold-Consistent Projection (MCP) that emphasizes critical feature dimensions within a Riemannian metric space. To address the issue of attribution instability in online continual learning, we introduce a {Reliable Reference Model (R-Model)} that maintains consistent knowledge through exponential moving average updates. This design prevents feedback loops during attribution computation and enables reliable feature importance estimation. Extensive experiments on Split CIFAR-10, Split CIFAR-100, and Split Mini-ImageNet demonstrate that PSAG achieves consistent improvements over strong baselines, confirming the efficacy of stabilized attribution guidance in resolving the stability-plasticity dilemma.

URL: https://openreview.net/forum?id=NvXpSvMrXS

---

Title: SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models

Abstract: Modeling long-sequence medical time series data, such as electrocardiograms (ECG), poses significant challenges due to high sampling rates, multichannel signal complexity, inherent noise, and limited labeled data. While recent self-supervised learning (SSL) methods, based on various encoder architectures such as convolutional neural networks, have been proposed to learn representations from unlabeled data, they often fall short in capturing long-range dependencies and noise-invariant features. Structured state space models (S4) have recently shown promise for efficient long-sequence modeling; however, existing S4-based architectures are not designed to capture the unique characteristics of multichannel physiological waveforms. In this work, we propose SL-S4Wave, a self-supervised learning framework that combines contrastive learning with a tailored encoder built on structured state space models. The encoder incorporates multi-layer global convolution using multiscale subkernels, enabling the capture of both fine-grained local patterns and long-range temporal dependencies in noisy, high-resolution multichannel waveforms. Extensive experiments on three real-world datasets demonstrate that SL-S4Wave (1) consistently outperforms state-of-the-art supervised and self-supervised baselines in a challenging arrhythmia detection task, (2) achieves high performance with significantly fewer labeled examples, showcasing strong label efficiency, and (3) maintains robust performance on long waveform segments, highlighting its capacity to model complex temporal dynamics in long sequences that most existing approaches fail to efficiently model, and (4) transfers effectively to unseen arrhythmia types, underscoring its robust cross-domain generalization.

URL: https://openreview.net/forum?id=km0xS3jZeO

---

Title: Bridging VMP and CEP: Theoretical Insights for Connecting Different Approximate Bayesian Inference Methods

Abstract: Approximate Bayesian inference (ABI) methods have become indispensable tools in modern machine learning and statistics for approximating intractable posterior distributions. Despite the related extensive studies and applications across diverse domains, the theoretical connections among these methods have remained relatively unexplored. This paper takes the first step to uncover the underlying relationships between two widely employed ABI techniques: the variational message passing (VMP) and the conditional expectation propagation (CEP) methods. Through rigorous mathematical analysis, we demonstrate a strong connection between these two approaches under mild conditions, from optimization as well as graphical model perspectives. This newly unveiled connection not only enhances our understanding of the performance and convergence properties of VMP and CEP, but it also facilitates the cross-fertilization of their respective strengths. For instance, we establish the convergence of CEP under mild conditions and demonstrate how this connection facilitates the construction of streaming VMP. Furthermore, our findings provide insights into the underlying relationships and distinctive characteristics of other ABI methods, shedding new light on the understanding and development of more advanced ABI techniques. To validate our theoretical findings, we derive and analyze various ABI methods within the context of Bayesian tensor decomposition, a fundamental tool in machine learning research. Specifically, we show that these two approaches yield the same updates within this context and illustrate how the established connection can be leveraged to construct a streaming version of the VMP-based Bayesian tensor decomposition algorithm.

URL: https://openreview.net/forum?id=QdO4VrnNfb

---

Title: On the Fundamental Limits of LLMs at Scale

Abstract: Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning degradation, (4) retrieval fragility, and (5) multimodal misalignment. While existing surveys describe these phenomena empirically, they lack a rigorous theoretical synthesis connecting them to the foundational limits of computation, information, and learning. This work closes that gap by presenting a unified, proof-informed framework that formalizes the innate theoretical ceilings of LLM scaling. First, computability and uncomputability imply an irreducible residue of error: for any computably enumerable model family, diagonalization guarantees inputs on which some model must fail, and undecidable queries (e.g., halting-style tasks) induce infinite failure sets for all computable predictors. Second, information-theoretic and statistical constraints bound attainable accuracy even on decidable tasks, finite description length enforces compression error, and long-tail factual knowledge requires prohibitive sample complexity. Third, geometric and computational effects compress long contexts far below their nominal size due to positional under-training, encoding attenuation, and softmax crowding. We further show how likelihood-based training favors pattern completion over inference, how retrieval under token limits suffers from semantic drift and coupling noise, and how multimodal scaling inherits shallow cross-modal alignment. Across sections, we pair theorems and empirical evidence to outline where scaling helps, where it saturates, and where it cannot progress, providing both theoretical foundations and practical mitigation paths like bounded-oracle retrieval, positional curricula, and sparse or hierarchical attention.

URL: https://openreview.net/forum?id=BIRDGVrom8

---

Title: Recursive Reasoning for Sample-Efficient Multi-Agent Reinforcement Learning

Abstract: Policy gradient algorithms for deep multi-agent reinforcement learning (MARL) typically employ an update that responds to the current strategies of other agents. While being straightforward, this approach does not account for the updates of other agents within the same update step, resulting in miscoordination and reduced sample efficiency. In this paper, we introduce methods that recursively refine the policy gradient by updating each agent against the updated policies of other agents within the same update step, speeding up the discovery of effective coordinated policies. We provide principled implementations of recursive reasoning in MARL by applying it to competitive multi-agent algorithms in both on and off-policy regimes. Empirically, we demonstrate superior performance and sample efficiency over existing deep MARL algorithms in StarCraft II and multi-agent MuJoCo. We theoretically prove that higher recursive reasoning in gradient-based methods with finite iterates achieves monotonic convergence to a local Nash equilibrium under certain conditions.

URL: https://openreview.net/forum?id=k5zVPe32VX

---

Title: Transitioning Heads Conundrum: The Hidden Bottleneck in Long-Tailed Class-Incremental Learning

Abstract: Long-Tailed Class-Incremental Learning (LTCIL) faces a fundamental tension: models must sequentially learn new classes while contending with extreme class imbalance, which amplifies catastrophic forgetting. A particularly overlooked phenomenon is the Transitioning Heads Conundrum: as replay buffers constrain memory, initially well-represented head classes shrink over time and effectively become tail classes, undermining knowledge retention. Existing approaches fail to address this because they apply knowledge distillation too late, after these transitions have already eroded head-class representations. To overcome this, we introduce DEcoupling Representations for Early Knowledge distillation (DEREK), which strategically employs Early Knowledge Distillation to safeguard head-class knowledge before data constraints manifest. Comprehensive evaluation across 2 LTCIL benchmarks, 12 experimental settings, and 24 baselines, including Long-Tail, Class-Incremental, Few-Shot CIL, and LTCIL methods, shows that DEREK maintains competitive performance across categories, establishing new state-of-the-art results.

URL: https://openreview.net/forum?id=Hb2Jvi5M7X

---

Reply all

Reply to author

Forward

0 new messages