Daily TMLR digest for Nov 20, 2025

0 views
Skip to first unread message

TMLR

unread,
Nov 20, 2025, 12:30:07 AMNov 20
to tmlr-anno...@googlegroups.com


New certifications
==================

J2C Certification: Preserving Expert-Level Privacy in Offline Reinforcement Learning

Navodita Sharma, Vishnu Vinod, Abhradeep Guha Thakurta, Alekh Agarwal, Borja Balle, Christoph Dann, Aravindan Raghuveer

https://openreview.net/forum?id=2bj0eVgCdO

---


Survey Certification: Scaling Laws of Distributed Random Forests

Katharina Flügel, Charlotte Debus, Markus Götz, Achim Streit, Marie Weiel

https://openreview.net/forum?id=ICHxTlgnSy

---


Accepted papers
===============


Title: LBMamba: Locally Bi-directional Mamba

Authors: Jingwei Zhang, Xi Han, Hong Qin, Mahdi S. Hosseini, Dimitris Samaras

Abstract: Mamba, a State Space Model (SSM) that accelerates training by recasting recurrence as a parallel selective scan, has recently emerged as a linearly-scaling, efficient alternative to self-attention. Because of its unidirectional nature, each state in Mamba only has information of its previous states and is blind to states after. Current Mamba-based computer-vision methods typically overcome this limitation by augmenting Mamba's global forward scan with a global backward scan, forming a bi-directional scan that restores a full receptive field. However, this operation doubles the computational load, eroding much of the efficiency advantage that originally Mamba have. To eliminate this extra scans, we introduce LBMamba, a locally bi-directional SSM block that embeds a lightweight locally backward scan inside the forward selective scan and executes it entirely in per-thread registers. Building on LBMamba, we present LBVim, a scalable vision backbone that alternates scan directions every two layers to recover a global receptive field without extra backward sweeps. We validate the versatility of our approach on both natural images and whole slide images (WSIs). We show that our LBVim constantly offers a superior performance–throughput trade-off. That is under the same throughput, LBVim achieves 0.8% to 1.6% higher top-1 accuracy on the ImageNet-1K classification dataset, 0.6% to 2.7% higher mIoU on the ADE20K semantic segmentation dataset, 0.9% higher AP$^b$ and 1.1% higher AP$^m$ on the COCO detection dataset. Our method serves as a general-purpose enhancement, boosting the accuracy of four SOTA Mamba models, namely VMamba, LocalVim, PlainMamba and Adventurer, by 0.5% to 3.4%. We also integrate LBMamba into the SOTA pathology multiple instance learning (MIL) approach, MambaMIL, which uses single directional scan. Experiments on 3 public WSI classification datasets show that our method achieves a relative improvement of up to 3.06% better AUC, 3.39% better F1, 1.67% better accuracy. Our code is available at https://github.com/cvlab-stonybrook/LBMamba.

URL: https://openreview.net/forum?id=e1aXaIXblQ

---

Title: PersonalizedRouter: Personalized LLM Routing via Graph-based User Preference Modeling

Authors: Zhongjie Dai, Tao Feng, Jiaxuan You

Abstract: The growing number of Large Language Models (LLMs) with diverse capabilities and response styles provides users with a wider range of choices, which presents challenges in selecting appropriate LLMs, as user preferences vary in terms of performance, cost, and response style. Current LLM selection methods typically optimize for a single fixed objective, such as performance, cost, or a trade-off between them, and fail to learn individual user preferences from interaction data. To address these limitations, we propose PersonalizedRouter, a graph-based framework that models diverse user profiles and performs personalized LLM selection by leveraging interaction data that includes task context, queries, candidate LLMs, and user decisions. To capture contextual information between user queries and optimal LLMs, PersonalizedRouter converts the interaction data into a heterogeneous graph, where the relationships between different types of nodes are represented by edges. To evaluate adaptability across users, we design two strategies: the multi-cost-efficiency simulation strategy and the LLM-as-a-Judge strategy. In addition, we construct PersonaRoute-Bench, a large-scale benchmark with 1,000 simulated users and 10 LLMs. Experimental results show that PersonalizedRouter significantly outperforms existing LLM selection methods and surpasses the strongest methods by a large margin of 15.38% and 9.83% under two simulation strategies. On the PersonaRoute-Bench with 1,000 users, it further surpasses the best methods by 16.19% and 59.69% while maintaining higher efficiency. Moreover, PersonalizedRouter demonstrates strong few-shot generalization, achieving 64.81% and 85.80% of the fully trained model’s performance when adapting to new users and new LLMs.

URL: https://openreview.net/forum?id=W80eE3ArAl

---

Title: Preserving Expert-Level Privacy in Offline Reinforcement Learning

Authors: Navodita Sharma, Vishnu Vinod, Abhradeep Guha Thakurta, Alekh Agarwal, Borja Balle, Christoph Dann, Aravindan Raghuveer

Abstract: The offline reinforcement learning (RL) problem aims to learn an optimal policy from historical data collected by one or more behavioural policies (experts) by interacting with an environment. However, the individual experts may be privacy-sensitive in that the learnt policy may retain information about their precise choices. In some domains like personalized retrieval, advertising and healthcare, the expert choices are considered sensitive data. To provably protect the privacy of such experts, we propose a novel consensus-based expert-level differentially private offline RL training approach compatible with any existing offline RL algorithm. We prove rigorous differential privacy guarantees, while maintaining strong empirical performance. Unlike existing work in differentially private RL, we supplement the theory with proof-of-concept experiments on classic RL environments featuring large continuous state spaces, demonstrating substantial improvements over a natural baseline across multiple tasks.

URL: https://openreview.net/forum?id=2bj0eVgCdO

---

Title: ExDBN: Learning Dynamic Bayesian Networks using Extended Mixed-Integer Programming Formulations

Authors: Pavel Rytíř, Aleš Wodecki, Georgios Korpas, Jakub Marecek

Abstract: Causal learning from data has received much attention recently. Bayesian networks can be used to capture causal relationships. There, one recovers a weighted directed acyclic graph in which random variables are represented by vertices, and the weights associated with each edge represent the strengths of the causal relationships between them. This concept is extended to capture dynamic effects by introducing a dependency on past data, which may be captured by the structural equation model. This formalism is utilized in the present contribution to propose a score-based learning algorithm. A mixed-integer quadratic program is formulated and an algorithmic solution proposed, in which the pre-generation of exponentially many acyclicity constraints is avoided by utilizing the so-called branch-and-cut (``lazy constraint'') method. Comparing the novel approach to the state-of-the-art, we show that the proposed approach turns out to produce more accurate results when applied to small and medium-sized synthetic instances containing up to 80 time series. Lastly, two interesting applications in bioscience and finance, to which the method is directly applied, further stress the importance of developing highly accurate, globally convergent solvers that can handle instances of modest size.

URL: https://openreview.net/forum?id=I64MJzl9Fy

---

Title: Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

Authors: Yu Yang, Pan Xu

Abstract: Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's capability to model long sequences. Recent works have demonstrated that using parts of trajectories from training tasks as prompts in DT enhances its performance on unseen tasks, giving rise to Prompt-DT methods. However, collecting data from specific environments can be both costly and unsafe in many scenarios, leading to suboptimal performance and limited few-shot prompt abilities due to the data-hungry nature of Transformer-based models. Additionally, the limited datasets used in pre-training make it challenging for Prompt-DT type of methods to distinguish between various RL tasks through prompts alone. To address these challenges, we introduce the Language model-initialized Prompt Decision Transformer (LPDT) framework, which leverages pretrained language models providing rich prior knowledge for RL tasks and fine-tunes the sequence model using Low-rank Adaptation (LoRA) for meta-RL problems. We further incorporate prompt regularization to effectively differentiate between tasks based on prompt feature representations. Comprehensive empirical studies demonstrate that initializing with a pre-trained language model provides the prior knowledge and achieves a similar performance with Prompt-DT under only $10\%$ data in some MuJoCo control tasks. We also provide a thorough ablation study to validate the effectiveness of each component, including sequence modeling, language models, prompt regularizations, and prompt strategies.

URL: https://openreview.net/forum?id=k520i3XEMK

---

Title: Scaling Laws of Distributed Random Forests

Authors: Katharina Flügel, Charlotte Debus, Markus Götz, Achim Streit, Marie Weiel

Abstract: Random forests are a widely used machine learning technique valued for their robust predictive performance and conceptual simplicity. They are applied in many critical applications and often combined with federated learning to collaboratively build machine learning models across multiple distributed sites. The independent decision trees make random forests inherently parallelizable and well-suited for distributed and federated settings. Despite this perfect fit, there is a lack of comprehensive scalability studies, and many existing methods show limited parallel efficiency or are tested only at smaller scales. To address this gap, we present a comprehensive analysis of the scaling capabilities of distributed random forests on up to 64 compute nodes. Using a tree-parallel approach, we demonstrate a strong scaling speedup of up to 31.98 and a weak scaling efficiency of over 0.96 without affecting predictive performance of the global model. Comparing the performance trade-offs of distributed and local inference strategies enables us to simulate various real-life scenarios in terms of distributed computing resources, data availability, and privacy considerations. We further explore how increasing model and data size improves prediction accuracy, scaling up to 51 200 trees and 7.5 million training samples. We find that while distributing the data across nodes leads to super-scalar speedup, it negates the predictive benefit of increased data. Finally, we study the impact of distributed and non-IID data and find that while global imbalance reduces performance, local distribution differences can help mitigate this effect.

URL: https://openreview.net/forum?id=ICHxTlgnSy

---


New submissions
===============


Title: Contextual Learning for Anomaly Detection in Tabular Data

Abstract: Anomaly detection is critical in domains such as cybersecurity and finance, especially when working with large-scale tabular data. Yet, unsupervised anomaly detection---where no labeled anomalies are available---remains challenging because traditional deep learning methods model a single global distribution, assuming all samples follow the same behavior. In contrast, real-world data often contain heterogeneous contexts (e.g., different users, accounts, or devices), where globally rare events may be normal within specific conditions. We introduce a \emph{contextual learning framework} that explicitly models how normal behavior varies across contexts by learning conditional data distributions $P(\mathbf{Y} \mid \mathbf{C})$ rather than a global joint distribution $P(\mathbf{X})$. The framework encompasses (1) a probabilistic formulation for context-conditioned learning, (2) a principled bilevel optimization strategy for automatically selecting informative context features using early validation loss, and (3) theoretical grounding through variance decomposition and discriminative learning principles. We instantiate this framework using a novel conditional Wasserstein autoencoder as a simple yet effective model for tabular anomaly detection. Extensive experiments across eight benchmark datasets demonstrate that contextual learning consistently outperforms global approaches---even when the optimal context is not intuitively obvious---establishing a new foundation for anomaly detection in heterogeneous tabular data.

URL: https://openreview.net/forum?id=PmqZslRENW

---

Title: BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via Bilevel Optimization

Abstract: Models initialized from self-supervised pretraining may suffer from poor alignment with downstream tasks, limiting the extent to which subsequent fine-tuning can adapt relevant representations acquired during the pretraining phase. To mitigate this, we introduce BiSSL, a novel bilevel training framework that enhances the alignment of self-supervised pretrained models with downstream tasks by explicitly incorporating both the pretext and downstream tasks into a preparatory training stage prior to fine-tuning. BiSSL solves a bilevel optimization problem in which the lower-level adheres to the self-supervised pretext task, while the upper-level encourages the lower-level backbone to align with the downstream objective. The bilevel structure facilitates enhanced information sharing between the tasks, ultimately yielding a backbone model that is more aligned with the downstream task, providing a better initialization for subsequent fine-tuning. We propose a general training algorithm for BiSSL that is compatible with a broad range of pretext and downstream tasks. We demonstrate that our proposed framework significantly improves accuracy on the vast majority of a broad selection of image-domain downstream tasks, and that these gains are consistently retained across a wide range of experimental settings. In addition, exploratory alignment analyses further underpin that BiSSL enhances downstream alignment of pretrained representations.

URL: https://openreview.net/forum?id=GQAGlqOpyA

---

Title: Adversarial Attacks in Weight-Space Classifiers

Abstract: Implicit Neural Representations (INRs) have been recently garnering increasing interest in
various research fields, mainly due to their ability to represent large, complex data in a compact and continuous manner. Past work further showed that numerous popular downstream
tasks can be performed directly in the INR parameter-space. Doing so can substantially
reduce the computational resources required to process the represented data in their native domain. A major difficulty in using modern machine-learning approaches, is their high
susceptibility to adversarial attacks, which have been shown to greatly limit the reliability
and applicability of such methods in a wide range of settings. In this work, we show that
parameter-space models trained for classification are inherently robust to adversarial attacks
– without the need of any robust training. To support our claims, we develop a novel suite of
adversarial attacks targeting parameter-space classifiers, and furthermore analyze practical
considerations of such attacks.

URL: https://openreview.net/forum?id=eOLybAlili

---

Title: Achieving Faster than O(1/t) Convergence in Convex Federated Learning

Abstract: This paper aims to achieve faster than O(1/t) convergence in federated learning for general
convex loss functions. Under the independent and identical distribution (IID) condition, we
show that accurate convergence to an optimal solution can be achieved in convex federated
learning even when individual clients select stepsizes locally without any coordination. More
importantly, this local stepsize strategy allows exploitation of the local geometry of individual
clients’ loss functions, and is shown to lead to faster convergence than the case where
a same universal stepsize is used for all clients. Then, when the distribution is non-IID,
we employ the sharing of gradients besides the global model parameter to ensure o(1/t)
convergence to an optimal solution in convex federated learning. For both algorithms, we
theoretically prove that stepsizes that are much larger than existing counterparts are allowed,
which leads to much faster convergence in empirical evaluations. It is worth noting
that, beyond providing a general framework for federated learning with drift correction, our
second algorithm’s achievement of o(1/t) convergence to the exact optimal solution under
general convex loss functions has not been previously reported in the federated learning
literature—except in certain restricted convex cases with additional constraints. We believe
that this is significant because even after incorporating momentum, existing first-order
federated learning algorithms can only ensure O(1/t) convergence for general convex loss
functions when no additional assumptions on heterogeneity are imposed.

URL: https://openreview.net/forum?id=Dae3jVdPod

---

Title: TABASCO: A Fast, Simplified Model for Molecular Generation with Improved Physical Quality

Abstract: State-of-the-art models for 3D molecular generation are based on significant inductive biases: SE(3) equivariance, permutation invariance and graph message‑passing networks to capture local chemistry, yet the generated molecules struggle with physical plausibility.
We introduce TABASCO which relaxes these assumptions: The model has a standard non-equivariant transformer architecture, treats atoms in a molecule as sequences and does not explicitly model bonds. The absence of equivariant layers and message passing allows us to simplify the model architecture and scale data throughput.
On the GEOM‑Drugs and QM9 benchmarks TABASCO achieves state-of-the-art PoseBusters validity and delivers inference roughly 10x faster than the strongest baseline, while exhibiting emergent rotational equivariance without hard-coded symmetry.
Our work offers a blueprint for training minimalist, high‑throughput generative models suited to tasks such as structure‑ and pharmacophore‑based drug design.
We provide a link to our implementation at https://anonymous.4open.science/r/tabasco-EBC8/.

URL: https://openreview.net/forum?id=Kg6CSrbXl4

---

Title: A Simple Scaling Model for Bootstrapped DQN

Abstract: We present a large-scale empirical study of Bootstrapped DQN (BDQN) and Randomized-Prior BDQN (RP-BDQN) in the DeepSea environment designed to isolate and parameterize exploration difficulty. Our primary contribution is a simple scaling model that accurately captures the probability of reward discovery as a function of task hardness and ensemble size. This model is parameterized by a method-dependent effectiveness factor, $\psi$. Under this framework, RP-BDQN demonstrates substantially higher effectiveness ($\psi \approx 0.87$) compared to BDQN ($\psi \approx 0.80$), enabling it to solve more challenging tasks. Our analysis reveals that this advantage stems from RP-BDQN's sustained ensemble diversity, which mitigates the posterior collapse observed in BDQN. Interestingly, the model's success, despite assuming member independence, suggests that complex ensemble interactions may be a secondary factor in overall performance. Furthermore, we show how systematic deviations from this simple model can be used to diagnose more subtle dynamics like cooperation and diversity saturation. These results offer practical guidance for ensemble configuration and propose a methodological framework for future studies of deep exploration.

URL: https://openreview.net/forum?id=OpfrMFep8B

---

Title: Don’t Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning

Abstract: Large language models (LLMs) have shown substantial capacity for generating fluent, contextually appropriate responses. However, they can produce hallucinated outputs, especially when a user query includes one or more false premises—claims that contradict established facts. Such premises can mislead LLMs into offering fabricated or misleading details. Existing approaches include pretraining, fine-tuning, and inference-time techniques that often rely on access to logits or address hallucinations after they occur. These methods tend to be computationally expensive, require extensive training data, or lack proactive mechanisms to prevent hallucination before generation, limiting their efficiency in real-time applications. We propose a retrieval-based framework that identifies and addresses false premises before generation. Our method first transforms a user’s query into a logical representation, then applies retrieval-augmented generation (RAG) to assess the validity of each premise using factual sources. Finally, we incorporate the verification results into the LLM’s prompt to maintain factual consistency in the final output. Experiments show that this approach effectively reduces hallucinations, improves factual accuracy, and does not require access to model logits or large-scale fine-tuning.

URL: https://openreview.net/forum?id=BDxStRGWba

---

Title: An Efficient Subset Selection Strategy Using Text-Guided Data Attribution to Mitigate Simplicity Bias

Abstract: The effectiveness of deep learning models heavily relies on the quality and diversity of their training data. However, datasets collected from different sources often introduce simplicity biases, where a models rely on easily learnable but non-predictive (spurious) features for its predictions. While existing debiasing techniques focus on model robustness, they leave the data untouched. However, as data becomes increasingly valuable, identifying and mitigating bias directly at the data level has become increasingly important. Recently, data attribution has emerged as a promising tool for uncovering issues in training data, yet its vulnerability to simplicity bias has received limited attention. In this work, we propose a novel data deletion framework that combines Neural Tangent Kernel (NTK)-based data attribution with textual descriptions of bias to identify and remove training samples that do not significantly affect model performance. We first demonstrate that NTK-based data attribution methods can themselves be influenced by spurious features. Subsequently, to mitigate this, we use available metadata or, when unavailable, a vision-language model to annotate a small validation set and extract a textual description of the bias. Based on this description and the attribution score, we identify the subset of training data that are semantically aligned with the spurious feature and affect the generalization of the model. Removing these samples from the training dataset and training model on the new subset improves the average and worst-group accuracy of the model, outperforming existing attribution-based baselines.

URL: https://openreview.net/forum?id=zZ5YundT95

---

Title: Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper

Abstract: AI Scientist systems are autonomous agents capable of conducting scientific research. Understanding their current capabilities and risks is essential for ensuring trustworthy and sustainable AI-driven scientific progress while preserving the integrity of the academic ecosystem. To this end, we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist system that mimics the core research workflow of a novice student researcher: Given the baseline paper from the human mentor, it analyzes its limitations, formulates novel hypotheses for improvement, validates them through rigorous experimentation, and writes a paper with the results. Unlike previous approaches that assume full automation or operate on small-scale code, Jr. AI Scientist follows a well-defined research workflow and leverages modern coding agents to handle complex, multi-file implementations, leading to scientifically valuable contributions. Through our experiments, the Jr. AI Scientist successfully generated new research papers that build upon real NeurIPS, IJCV, and ICLR works by proposing and implementing novel algorithms. For evaluation, we conducted automated assessments using AI Reviewers, author-led evaluations, and submissions to Agents4Science, a venue dedicated to AI-driven scientific contributions. The findings demonstrate that Jr. AI Scientist generates papers receiving higher review scores than existing fully automated systems. Nevertheless, we identify important limitations from both the author evaluation and the Agents4Science reviews, indicating the potential risks of directly applying current AI Scientist systems and key challenges for future research. Finally, we comprehensively report various risks identified during development. We believe this study clarifies the current role and limitations of AI Scientist systems, offering insights into the areas that still require human expertise and the risks that may emerge as these systems evolve.

URL: https://openreview.net/forum?id=OeV062d8Sw

---

Title: Survey on Coresets for Deep Learning: Methods and Applications

Abstract: This survey presents a comprehensive review of coreset methods in deep learning, an important tool for improving data efficiency in large-scale neural networks. In general, ``coreset'' is an algorithmic technique for selecting a small yet representative subset of data to replace the full dataset, which can yield more efficient training process and meanwhile preserve model performance. In the past 20 years, coreset techniques have been widely applied to many classical machine learning problems, such as clustering, regression and classification. In recent years, the coreset techniques also begin to attract a lot of attention in modern deep learning area. However, designing effective coresets usually is a challenging task since we need to take account of the trade-off among multiple different factors, such as complexity, robustness and accuracy. In this survey, we focus on two common scenarios for using coreset methods in deep learning: (1) reducing the extremely high computational cost for training a deep learning model, and (2) improving the data utilization under resource constraints such as limited label budget or storage capacity. We begin by outlining the fundamental principles, advantages, and design challenges of coresets for these two scenarios. We also discuss the emerging applications of coresets in large language models. Finally, we identify several open problems and promising directions for future research.

URL: https://openreview.net/forum?id=ytYWmZ9haH

---

Title: LoDAdaC: a unified local training-based decentralized framework with Adam-type updates and compressed communication

Abstract: In the decentralized distributed learning, achieving fast convergence and low communication cost is essential for scalability and high efficiency. Despite extensive research, existing decentralized methods can either have fast convergence or enjoy low communication cost but cannot achieve both goals simultaneously. This disadvantage causes significant inefficiency (either in computation or communication) in solving large-scale decentralized learning problems,
e.g., in large language model training. To address this limitation, we propose LoDAdaC, a unified multiple \textbf{Lo}cal Training (MLT) \textbf{D}ecentralized framework with \textbf{Ada}m-type updates and \textbf{C}ompressed communication (CC). LoDAdaC accommodates a broad class of optimizers for its local adaptive updates, including AMSGrad, Adam, and AdaGrad; it is compatible with standard (possibly biased) compressors such as low-bit quantization and sparsification. MLT and CC enable LoDAdaC to achieve multiplied reduction of communication cost, while the technique of adaptive updates enables fast convergence. We rigorously prove the combined advantage through complexity analysis. In addition, experiments on image classification and large language model training validate our theoretical findings and show that LoDAdaC significantly outperforms existing decentralized algorithms in terms of convergence speed and communication efficiency.

URL: https://openreview.net/forum?id=0qoy9usvnm

---

Title: AutoGeTS: Knowledge-based Automated Generation of Text Synthetics for Improving Text Classification

Abstract: When developing text classification models for real-world applications, one major challenge is the difficulty of collecting sufficient data for all text classes. In this work, we address this challenge by utilizing large language models (LLMs) to generate synthetic data and using such data to improve the performance of the models without waiting for more real data to be collected and labeled. As an LLM generates different synthetic data in response to different input examples, we formulate an automated workflow, which searches for input examples that lead to more ``effective'' synthetic data for improving the model concerned. We study three search strategies with an extensive set of experiments, and use experiment results to inform an ensemble algorithm that selects a search strategy according to the characteristics of a class. Our further experiments demonstrate that this ensemble approach is more effective than each individual strategy in our automated workflow for improving classification models using LLMs. The source code of the main software developed for this work is made available at https://anonymous.4open.science/r/AutoGeTS-2B0D.

URL: https://openreview.net/forum?id=y7B87znyuZ

---

Reply all
Reply to author
Forward
0 new messages