Weekly TMLR digest for Aug 10, 2025

8 views

Skip to first unread message

TMLR

unread,

Aug 10, 2025, 12:00:11 AMAug 10

to tmlr-annou...@googlegroups.com

New certifications
==================

Featured Certification: Understanding In-Context Learning of Linear Models in Transformers Through an Adversarial Lens

Usman Anwar, Johannes von Oswald, Louis Kirsch, David Krueger, Spencer Frei

https://openreview.net/forum?id=CtMXJxO7SJ

---

Accepted papers
===============

Title: Decoding-based Regression

Authors: Xingyou Song, Dara Bahri

Abstract: Language models have recently been shown capable of performing regression wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal sequence decoding models as numeric regression heads given any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoder-based heads are as performant as standard pointwise heads when benchmarked over standard regression tasks, while being flexible enough to capture smooth numeric distributions, such as in the task of density estimation.

URL: https://openreview.net/forum?id=avUQ8jguxg

---

Title: Explaining Confident Black-Box Predictions

Authors: Evan Yao, Retsef Levi, Assaf Avrahami, Abraham Meidan

Abstract: Interpretability is crucial for leveraging predictive machine learning for decision-making, but the strongest performing models are often black-boxes in that they are difficult to understand. For binary classification models, a growing body of literature seeks to find \textit{model-agnostic} explanations by treating a model as a list of 0/1 predictions and identifying patterns for when a model predicts $1$ over $0$ (or vice versa). While such explanations are
useful for understanding when a model predicts 1 over 0, they do
not consider the confidence (i.e., the probability) behind predictions, a critical piece of information provided by most classification models. Since the 0/1 predictions of a model depend on the choice of a subjective threshold for discretizing predicted probabilities, as one changes the threshold, the resulting explanations may change despite the underlying model staying the same. In contrast, this work proposes model-agnostic explanations that treat a black-box model as a \textit{ranking} across a dataset from lowest predicted probability of $1$ to highest, rather than a list of 0/1 predictions. Under this ranking, a useful explanation should capture broadly when a model \textit{confidently} predicts $1$ (i.e., highly ranked data points). Since highly confident predictions are often correlated with predictions that are more accurate and actionable, understanding when a model predicts confidently is often quite valuable to a practitioner.

This work builds explanations based on rule lists (i.e., a collection of if-then rules) as well as a novel special case called checklists. A strong rule list or checklist is satisfied by a large number of data points that are ranked highly by the model. This criteria is measured by the traditional metric of support (i.e., the number of data points an explanation applies to), the \textit{average} ranking of those data points, which we call the Average Black-Box Ranking (ABBR), as well as the sparsity of the explanation (e.g., number of rules in the rule list, among others). Given these metrics, this work develops a local-search based optimization methodology for finding explanations based on rule lists and checklists that maximize ABBR for a user-specified support and sparsity constraint. The methodology leverages a local search approach where an initial rule list is chosen greedily from a pool of candidate rules, then slowly perturbed by swapping rules from the rule list with those in the candidate pool. This approach is evaluated on 6 real world datasets in application areas ranging from healthcare to criminal justice and finance. Empirical results suggest that this methodology finds rule lists of length at most 5 with ABBR within 7.4\% of the optimal ABBR of any explanation, while checklists provide greater interpretability for a small cost in performance.

URL: https://openreview.net/forum?id=SAwZpgKJcc

---

Title: Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Authors: Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Nathaniel Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J Zico Kolter

Abstract: Prompt engineering is an effective but labor-intensive way to control text-to-image (T2I) generative models. Its time-intensive nature and complexity have spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, or produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically produces human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompt distribution built upon the reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles, and images across multiple T2I models, including Stable Diffusion, DALL-E, and Midjourney.

URL: https://openreview.net/forum?id=IVYVDN6pJ6

---

Title: Emergent Corpus Pre-training Benefits Vision Language Models

Authors: Makanjuola Adekunmi Ogunleye, Chase Vickery, Ismini Lourentzou

Abstract: Vision-Language Pre-trained Models (VL-PTMs) have achieved impressive performance across a wide range of tasks, but their success often hinges on access to large-scale multimodal datasets. While effective in high-resource settings, these models tend to struggle in data-scarce regimes. In this work, we investigate Emergent Communication (EC) as a mechanism to improve sample efficiency in VL-PTMs. We pre-train a Vision-Language Model (VLM) using EC tokens generated through a referential game between two artificial agents. Across three diverse cross-modal matching and reasoning benchmarks, EC pretraining yields substantial gains, improving Visual Referring Expression (VRE) accuracy by 108.6% and Visual Entailment (VE) by 69.6%. To further validate the effectiveness of EC pretraining, we introduce LLaVA-1.5-EC, a LLaVA variant trained entirely on EC tokens. LLaVA-1.5-EC outperforms strong LVLM baselines, including BLIP-2 (13B), achieving relative gains of 104.23% on VizWiz, 34.8% on GQA, and 10.8% on VQAv2, and top performance on MMBench, a challenging instruction-following benchmark. These results highlight the transferability and generalization capacity of EC pretraining and underscore the potential of leveraging grounded EC tokens to enhance vision-language reasoning in low-resource settings, especially in settings with limited natural language data. We discuss implications and propose avenues for future research to explore the connections between EC and VL for multimodal understanding and effective human-machine communication.

Project Website: https://plan-lab.github.io/ec-vlm/

URL: https://openreview.net/forum?id=bivKGSaXkD

---

Title: From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks

Authors: Vignesh Kothapalli, Tianyu Pang, Shenyang Deng, Zongmin Liu, Yaoqing Yang

Abstract: Training strategies for modern deep neural networks (NNs) tend to induce a heavy-tailed (HT) empirical spectral density (ESD) in the layer weights. While previous efforts have shown that the HT phenomenon correlates with good generalization in large NNs, a theoretical explanation of its occurrence is still lacking. Especially, understanding the conditions which lead to this phenomenon can shed light on the interplay between generalization and weight spectra. Our work aims to bridge this gap by presenting a simple, rich setting to model the emergence of HT ESD. In particular, we present a theory-informed setup for ‘crafting’ heavy tails in the ESD of two-layer NNs and present a systematic analysis of the HT ESD emergence without any gradient noise. This is the first work to analyze a noise-free setting, and we also incorporate optimizer (GD/Adam) dependent (large) learning rates into the HT ESD analysis. Our results highlight the role of learning rates on the Bulk+Spike and HT shape of the ESDs in the early phase of training, which can facilitate generalization in the two-layer NN. These observations shed light on the behavior of large-scale NNs, albeit in a much simpler setting.

URL: https://openreview.net/forum?id=DJHB8eBUnt

---

Title: Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Authors: Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Na Zou, Hanjie Chen, Xia Hu

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to lengthy and redundant outputs, known as the ''overthinking phenomenon''.
Efficient Reasoning, which seeks to optimize reasoning length while preserving reasoning capabilities, offers practical benefits such as faster processing times, lower energy consumption, and improved responsiveness, especially valuable for reasoning-intensive applications. Despite its potential, efficient reasoning remains in the early stages of research.
In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking.

URL: https://openreview.net/forum?id=HvoG8SxggZ

---

Title: One-Shot Federated Distillation Using Monoclass Teachers: A Study of Knowledge Fragmentation and Out-of-Distribution Supervision

Authors: Cedric Maron, Virginie Fresse, ORZALESI

Abstract: The performance of machine learning models critically depends on the quality and diversity of training data. However, privacy, legal, and proprietary concerns often limit direct data sharing. Many organizations possess high-quality data for specific classes and may wish to share the knowledge derived from it without revealing the data or engaging in collaborative training. While federated learning (FL) enables distributed model training, it typically assumes mutual benefit, requires repeated communication, and produces a shared global model. Another paradigm, knowledge distillation (KD), allows a student model to learn from teacher predictions. We propose a one-shot federated distillation method in which a single client learns from monoclass teacher models trained independently by multiple providers. Each provider shares its model once, and the client combines these with unlabeled data to distill a multiclass student model—aggregating knowledge from disjoint, class-specific sources. This unidirectional, asymmetric setup poses a key challenge: out-of-distribution (OOD) supervision, where monoclass teachers often mispredict unseen inputs, leading to noisy signals for the student. The main contribution of this work is a systematic study of knowledge fragmentation in one-shot federated distillation with monoclass teachers. We evaluate five configurations with varying class coverage per provider and show that increasing fragmentation intensifies OOD supervision, degrading student performance. Experiments on MNIST, FashionMNIST, and CIFAR-10 confirm that fragmentation consistently reduces student accuracy. To mitigate this, we discuss three strategies: (1) exposing teachers to diverse off-class examples, (2) penalizing overconfidence, and (3) using contrastive learning to sharpen feature boundaries.

URL: https://openreview.net/forum?id=ENdm5BM7aF

---

Title: Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Authors: Stephen Casper, Lennart Schulze, Oam Patel, Dylan Hadfield-Menell

Abstract: Despite extensive diagnostics and debugging by developers, AI systems sometimes exhibit harmful unintended behaviors. Finding and fixing these is challenging because the attack surface is so large – it is not tractable to exhaustively search for inputs that may elicit harmful behaviors. Red-teaming and adversarial training (AT) are commonly used to improve robustness, however, they empirically struggle to fix failure modes that differ from the attacks used during training. In this work, we utilize latent adversarial training (LAT) to defend against vulnerabilities without leveraging knowledge of what they are or using inputs that elicit them. LAT makes use of the compressed, abstract, and structured latent representations of concepts that the network actually uses for prediction. Here, we use it to defend against failure modes without examples that elicit them. Specifically, we use LAT to remove backdoors and defend against held-out classes of adversarial attacks. We show in image classification, text classification, and text generation tasks that LAT usually improves both robustness to novel attacks and performance on clean data relative to AT. This suggests that LAT can be a promising tool for defending against failure modes that are not explicitly identified by developers.

URL: https://openreview.net/forum?id=mVPPhQ8cAd

---

Title: Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization

Authors: Wei Liu, Anweshit Panda, Ujwal Pandey, Christopher Brissette, Yikang Shen, George Slota, Naigang Wang, Jie Chen, Yangyang Xu

Abstract: In this paper, we design two compressed decentralized algorithms for solving nonconvex stochastic optimization under two different scenarios. Both algorithms adopt a momentum technique to achieve fast convergence and a message-compression technique to save communication costs. Though momentum acceleration and compressed communication have been used in literature, it is highly nontrivial to theoretically prove the effectiveness of their composition in a decentralized algorithm that can maintain the benefits of both sides, because of the need to simultaneously control the consensus error, the compression error, and the bias from the momentum gradient.

For the scenario where gradients are bounded, our proposal is a compressed decentralized adaptive method. To the best of our knowledge, this is the first decentralized adaptive stochastic gradient method with compressed communication. For the scenario of data heterogeneity without bounded gradients, our proposal is a compressed decentralized heavy-ball method, which applies a gradient tracking technique to address the challenge of data heterogeneity. Notably, both methods achieve an optimal convergence rate, and they can achieve linear speed up and adopt topology-independent algorithmic parameters within a certain regime of the user-specified error tolerance. Superior empirical performance is observed over state-of-the-art methods on training deep neural networks (DNNs) and Transformers.

URL: https://openreview.net/forum?id=RqhMQHHkB4

---

Title: Generative Feature Training of Thin 2-Layer Networks

Authors: Johannes Hertrich, Sebastian Neumayer

Abstract: We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on the squared loss and small datasets. Due to the highly non-convex energy landscape, gradient-based training often suffers from local minima. As a remedy, we initialize the hidden weights with samples from a learned proposal distribution, which we parameterize as a deep generative model. To train this model, we exploit the fact that with fixed hidden weights, the optimal output weights solve a linear equation. After learning the generative model, we refine the sampled weights with a gradient-based post-processing in the latent space. Here, we also include a regularization scheme to counteract potential noise. Finally, we demonstrate the effectiveness of our approach by numerical examples.

URL: https://openreview.net/forum?id=6oXNpKuBDK

---

Title: Efficient Knowledge Injection in LLMs via Self-Distillation

Authors: Kalle Kujanpää, Pekka Marttinen, Harri Valpola, Alexander Ilin

Abstract: In many practical applications, large language models (LLMs) need to acquire new knowledge not present in their pre-training data. Efficiently leveraging this knowledge usually relies on supervised fine-tuning or retrieval-augmented generation (RAG). Although RAG has emerged as the industry standard for knowledge injection, fine-tuning has not yet achieved comparable success. This paper proposes utilizing prompt distillation, a self-distillation-based method previously explored primarily for style alignment and instruction tuning, to internalize new factual knowledge from free-form documents. Unlike prior methods, our approach requires neither larger teacher models nor structured knowledge formats. Across multiple LLM sizes and model families, we show that prompt distillation outperforms standard supervised fine-tuning and can even surpass RAG. We analyze the key factors contributing to prompt distillation's effectiveness and examine how it scales.

URL: https://openreview.net/forum?id=drYpdSnRJk

---

Title: Physics-Aware Spatiotemporal Causal Graph Network for Forecasting with Limited Data

Authors: Zijun Cui, Sam Griesemer, Sungyong Seo, Joshua Hikida, Yan Liu

Abstract: Spatiotemporal models have drawn significant interest recently due to their widespread applicability across many domains. These models are often made more practically useful by incorporating beneficial inductive biases, such as laws or symmetries from domain-relevant physics equations. This "physics-awareness" provides an interpretable means of grounding otherwise purely data-driven models, improving robustness and boosting performance in settings with limited data. In this work, we view physical dynamics as domain knowledge that captures fundamental causal relationships across space and time, and can be effectively leveraged by our proposed physics-aware spatiotemporal causal graph network (P-STCGN). We firstly describe a means of deriving causal relationships from spatiotemporal data, serving as physics-aware labels to learn a causal structure via a dedicated neural module. We then formulate a forecasting module that can operate under this causal structure, producing predictions that are guided by physics-aware cause-effect relationships among modeled variables. Extensive experimentation demonstrates that our method is robust to noisy and limited data, outperforming existing models across a variety of challenging synthetic tasks and benchmark datasets. We further evaluate our method on real-world graph signals and observe superior forecasting performance, achieved by effectively utilizing causal signals from prior physics knowledge.

URL: https://openreview.net/forum?id=n3yrVzPcNa

---

Title: ASkDAgger: Active Skill-level Data Aggregation for Interactive Imitation Learning

Authors: Jelle Luijkx, Zlatan Ajanović, Laura Ferranti, Jens Kober

Abstract: Human teaching effort is a significant bottleneck for the broader applicability of interactive imitation learning. To reduce the number of required queries, existing methods employ active learning to query the human teacher only in uncertain, risky, or novel situations. However, during these queries, the novice’s planned actions are not utilized despite containing valuable information, such as the novice’s capabilities, as well as corresponding uncertainty levels. To this end, we allow the novice to say: “I plan to do this, but I am uncertain.” We introduce the Active Skill-level Data Aggregation (ASkDAgger) framework, which leverages teacher feedback on the novice plan in three key ways: (1) S-Aware Gating (SAG): Adjusts the gating threshold to track sensitivity, specificity, or a minimum success rate; (2) Foresight Interactive Experience Replay (FIER), which recasts valid and relabeled novice action plans into demonstrations; and (3) Prioritized Interactive Experience Replay (PIER), which prioritizes replay based on uncertainty, novice success, and demonstration age. Together, these components balance query frequency with failure incidence, reduce the number of required demonstration annotations, improve generalization, and speed up adaptation to changing domains. We validate the effectiveness of ASkDAgger through language-conditioned manipulation tasks in both simulation and real-world environments. Code, data, and videos are available at https://askdagger.github.io.

URL: https://openreview.net/forum?id=987Az9f8fT

---

Title: L2G: Repurposing Language Models for Genomics Tasks

Authors: Wenduo Cheng, Junhong Shen, Mikhail Khodak, Jian Ma, Ameet Talwalkar

Abstract: Pre-trained language models have transformed the field of natural language processing (NLP), and their success has inspired efforts in genomics to develop domain-specific foundation models (FMs). However, creating high-quality genomic FMs from scratch is resource-intensive, requiring significant computational power and high-quality pre-training data. The success of large language models (LLMs) in NLP has largely been driven by industrial-scale efforts leveraging vast, diverse corpora and massive computing infrastructure. In this work, we aim to bypass the data and computational bottlenecks of creating genomic FMs from scratch and instead propose repurposing existing LLMs for genomics tasks. Inspired by the recently observed 'cross-modal transfer' phenomenon -- where transformers pre-trained on natural language can generalize to other modalities -- we introduce L2G, which adapts a pre-trained LLM architecture for genomics using neural architecture search and a novel three-stage training procedure. Remarkably, without requiring extensive pre-training on DNA sequence data, L2G achieves superior performance to fine-tuned genomic FMs and task-specific models on more than half of tasks across multiple genomics benchmarks. In an enhancer activity prediction task, L2G further demonstrates its capacity to identify significant transcription factor motifs. Our work not only highlights the generalizability and efficacy of language models in out-of-domain tasks such as genomics, but also opens new avenues for more efficient and less resource-intensive methodologies in genomic research.

URL: https://openreview.net/forum?id=5NM4guc90N

---

Title: Discovering group dynamics in coordinated time series via hierarchical recurrent switching-state models

Authors: Michael Wojnowicz, Kaitlin Gili, Preetish Rath, Eric Miller, Jeffrey W. Miller, Clifford Lee Hancock, Meghan O'Donovan, Seth Elkin-Frankston, Tad Brunye, Michael C Hughes

Abstract: We seek a computationally efficient model for a collection of time series arising from multiple interacting entities (a.k.a. "agents"). Recent models of temporal patterns across individuals fail to incorporate explicit system-level collective behavior that can influence the trajectories of individual entities. To address this gap in the literature, we present a new hierarchical switching-state model that can be trained in an unsupervised fashion to simultaneously learn both system-level and individual-level dynamics. We employ a latent system-level discrete state Markov chain that provides top-down influence on latent entity-level chains which in turn govern the emission of each observed time series. Recurrent feedback from the observations to the latent chains at both entity and system levels allows recent situational context to inform how dynamics unfold at all levels in bottom-up fashion. We hypothesize that including both top-down and bottom-up influences on group dynamics will improve interpretability of the learned dynamics and reduce error when forecasting. Our hierarchical switching recurrent dynamical model can be learned via closed-form variational coordinate ascent updates to all latent chains that scale linearly in the number of entities. This is asymptotically no more costly than fitting a separate model for each entity. Analysis of both synthetic data and real basketball team movements suggests our lean parametric model can achieve competitive forecasts compared to larger neural network models that require far more computational resources. Further experiments on soldier data as well as a synthetic task with 64 cooperating entities show how our approach can yield interpretable insights about team dynamics over time.

URL: https://openreview.net/forum?id=LHchZthcOf

---

Title: Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation

Authors: Vaibhav Seth, Ayan Sengupta, Arinjay Pathak, Aastha A K Verma, Natraj Raman, Sriram Gopalakrishnan, Niladri Chatterjee, Tanmoy Chakraborty

Abstract: Large Language Models (LLMs) are highly resource-intensive to fine-tune due to their enormous size. While low-rank adaptation is a prominent parameter-efficient fine-tuning approach, it suffers from sensitivity to hyperparameter choices, leading to instability in model performance on fine-tuning downstream tasks. This paper highlights the importance of effective parameterization in low-rank fine-tuning to reduce estimator variance and enhance the stability of final model outputs. We propose MonteCLoRA, an efficient fine-tuning technique that employs Monte Carlo estimation to learn an unbiased posterior estimation of low-rank parameters with low expected variance, stabilizing fine-tuned LLMs with only $\mathcal{O}(r)$ additional parameters, for a given rank $r$. MonteCLoRA shows significant improvements in accuracy and robustness, achieving up to $3.8$% higher accuracy and $8.6$% greater robustness than existing efficient fine-tuning methods on natural language understanding tasks with pre-trained RoBERTa-base. Furthermore, in generative tasks with pre-trained LLaMA-1-7B and LLaMA-3.2-3B-Instruct, MonteCLoRA demonstrates robust performance with $50\%$ and $62\%$ lower spreads, respectively, than the contemporary, efficient fine-tuning methods. The theoretical and empirical results presented in the paper underscore how parameterization and hyperpriors balance exploration-exploitation in the low-rank parametric space, therefore leading to more optimal and robust parameter estimation during efficient fine-tuning.

URL: https://openreview.net/forum?id=2HFmicB8kh

---

Title: Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance

Authors: Liya Guo, Zun Wang, Chang Liu, Junzhe Li, Pipi Hu, Yi Zhu, Tao Qin

Abstract: The ensemble average of physical properties of molecules is closely related to the distribution of molecular conformations, and sampling such distributions is a fundamental challenge in physics and chemistry. Traditional methods like molecular dynamics (MD) simulations and Markov chain Monte Carlo (MCMC) sampling are commonly used but can be time-consuming and costly. Recently, diffusion models have emerged as efficient alternatives by learning the distribution of training data. Obtaining an unbiased target distribution is still an expensive task, primarily because it requires satisfying ergodicity. To tackle these challenges, we propose Potential Score Matching (PSM), an approach that utilizes the potential energy gradient to guide generative models. PSM does not require exact energy functions and can debias sample distributions even when trained on limited and biased data. Our method outperforms existing state-of-the-art (SOTA) models on the Lennard-Jones (LJ) potential, a commonly used toy model. Furthermore, we extend the evaluation of PSM to high-dimensional problems using the MD17 and MD22 datasets. The results demonstrate that molecular distributions generated by PSM more closely approximate the Boltzmann distribution compared to traditional diffusion models.

URL: https://openreview.net/forum?id=tTdzbnvTno

---

Title: Understanding In-Context Learning of Linear Models in Transformers Through an Adversarial Lens

Authors: Usman Anwar, Johannes von Oswald, Louis Kirsch, David Krueger, Spencer Frei

Abstract: In this work, we make two contributions towards understanding of in-context learning of linear models by transformers. First, we investigate the adversarial robustness of in-context learning in transformers to hijacking attacks — a type of adversarial attacks in which the adversary’s goal is to manipulate the prompt to force the transformer to generate a specific output. We show that both linear transformers and transformers with GPT-2 architectures are vulnerable to such hijacking attacks. However, adversarial robustness to such attacks can be significantly improved through adversarial training --- done either at the pretraining or finetuning stage --- and can generalize to stronger attack models. Our second main contribution is a comparative analysis of adversarial vulnerabilities across transformer models and other algorithms for learning linear models. This reveals two novel findings. First, adversarial attacks transfer poorly between larger transformer models trained from different seeds despite achieving similar in-distribution performance. This suggests that transformers of the same architecture trained according to the same recipe may implement different in-context learning algorithms for the same task. Second, we observe that attacks do not transfer well between classical learning algorithms for linear models (single-step gradient descent and ordinary least squares) and transformers. This suggests that there could be qualitative differences between the in-context learning algorithms that transformers implement and these traditional algorithms.

URL: https://openreview.net/forum?id=CtMXJxO7SJ

---

Title: Set-Based Training for Neural Network Verification

Authors: Lukas Koller, Tobias Ladner, Matthias Althoff

Abstract: Neural networks are vulnerable to adversarial attacks, i.e., small input perturbations can significantly affect the outputs of a neural network. Therefore, to ensure safety of neural networks in safety-critical environments, the robustness of a neural network must be formally verified against input perturbations, e.g., from noisy sensors. To improve the robustness of neural networks and thus simplify the formal verification, we present a novel set-based training procedure in which we compute the set of possible outputs given the set of possible inputs and compute for the first time a gradient set, i.e., each possible output has a different gradient. Therefore, we can directly reduce the size of the output enclosure by choosing gradients toward its center. Small output enclosures increase the robustness of a neural network and, at the same time, simplify its formal verification. The latter benefit is due to the fact that a larger size of propagated sets increases the conservatism of most verification methods. Our extensive evaluation demonstrates that set-based training produces robust neural networks with competitive performance, which can be verified using fast (polynomial-time) verification algorithms due to the reduced output set.

URL: https://openreview.net/forum?id=n0lzHrAWIA

---

Title: Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A

Authors: Benjamin Plaut, Khanh Xuan Nguyen, Tu Trinh

Abstract: We study 15 large language models (LLMs) fine-tuned for chat and find that their maximum softmax probabilities (MSPs) are consistently miscalibrated on multiple-choice Q&A. However, those MSPs might still encode useful uncertainty information. Specifically, we hypothesized that wrong answers would be associated with smaller MSPs compared to correct answers. Via rigorous statistical testing, we show that this hypothesis holds for models which perform well on the underlying Q&A task. We also find a strong direction correlation between Q&A accuracy and MSP correctness prediction, while finding no correlation between Q&A accuracy and calibration error. This suggests that within the current fine-tuning paradigm, we can expect correctness prediction but not calibration to improve as LLM capabilities progress. To demonstrate the utility of correctness prediction, we show that when models have the option to abstain, performance can be improved by selectively abstaining based on the MSP of the initial model response, using only a small amount of labeled data to choose the MSP threshold.

URL: https://openreview.net/forum?id=E6LOh5vz5x

---

Title: Formulating Node Labelling as Node Classification or Link Prediction in Different Graph Representations

Authors: Tobias Möller, Borun Shi

Abstract: Message-passing Graph Neural Networks (GNNs) are increasingly used for predictive tasks on graphs. Much work has been done to improve GNN architectures, but how the actual data graph should be designed is not well studied. In this paper, we investigate how two different graph representations impact the performance of GNN models across datasets with varying characteristics grouped by homophily, heterogeneity, and number of labels per node. A unique phenomenon is that the same abstract predictive task of labelling nodes is formulated as a node classification problem on one representation and as a link prediction problem on the other. Our work is the first to blur the line between these two basic and fundamental tasks in graph learning. Our experiments on 12 real-world datasets suggest that different representations (and tasks) are optimal for different datasets, models, and hyperparameters. We derive empirical heuristics of choosing between the two and pave the way towards a criterion of choosing the optimal graph representations and towards formally understanding the interconnection between node classification and link prediction.

URL: https://openreview.net/forum?id=lK7tjysj0s

---

Title: Sortability of Time Series Data

Authors: Christopher Lohse, Jonas Wahl

Abstract: Evaluating the performance of causal discovery algorithms that aim to find causal relationships between time-dependent processes remains a challenging topic. In this paper, we show that certain characteristics of datasets, such as varsortability (Reisach et al. 2021) and R2-sortability (Reisach et al. 2023), also occur in datasets for autocorrelated stationary time series. We illustrate this empirically using four types of data: simulated data based on SVAR models and Erdős-Rényi graphs, the data used in the 2019 causality-for-climate challenge (Runge et al. 2019), real-world river stream datasets, and real-world data generated by the Causal Chamber of (Gamella et al. 2024). To do this, we adapt var- and R2-sortability to time series data. We also investigate the extent to which the performance of continuous score-based causal discovery methods goes hand in hand with high sortability. Arguably, our most surprising finding is that the investigated real-world datasets exhibit high varsortability and low R2-sortability indicating that scales may carry a significant amount of causal information.

URL: https://openreview.net/forum?id=OGvmCpcHdV

---

Title: Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture

Authors: Thomas F Burns, Tomoki Fukai, Christopher Earls

Abstract: Large language models (LLMs) demonstrate an impressive ability to utilise information within the context of their input sequences to appropriately respond to data unseen by the LLM during its training procedure. This ability is known as in-context learning (ICL). Humans and non-human animals demonstrate similar abilities, however their neural architectures differ substantially from LLMs. Despite this, a critical component within LLMs, the attention mechanism, resembles modern associative memory models, widely used in and influenced by the computational neuroscience community to model biological memory systems. Using this connection, we introduce an associative memory model capable of performing ICL. We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads. We test this architecture during training within a two-layer Transformer and show its ICL abilities manifest more quickly than without this modification. We then apply our architecture in small language models with 8 million and 1 billion parameters, focusing on attention head values, with results also indicating improved performance at these larger and more naturalistic scales.

URL: https://openreview.net/forum?id=lcTFm4LIRR

---

Title: Shared Imagination: LLMs Hallucinate Alike

Authors: Yilun Zhou, Caiming Xiong, Silvio Savarese, Chien-Sheng Wu

Abstract: Despite the recent proliferation of large language models (LLMs), their training recipes -- model architecture, pre-training data and optimization algorithm -- are often very similar. This naturally raises the question of the similarity among the resulting models. In this paper, we propose a novel setting, imaginary question answering (IQA), to better understand model similarity. In IQA, we ask one model to generate purely imaginary questions (e.g., on completely made-up concepts in physics) and prompt another model to answer. Surprisingly, despite the total fictionality of these questions, all models can answer each other's questions with remarkable consistency, suggesting a "shared imagination space" in which these models operate during such hallucinations. We conduct a series of investigations into this phenomenon and discuss the implications of such model homogeneity on hallucination detection and computational creativity.

URL: https://openreview.net/forum?id=NUXpBMtDYs

---

Title: Approximation, Estimation and Optimization Errors for a Deep Neural Network

Authors: Gerrit Welper, Benjamin Keene

Abstract: The error of supervised learning is typically split into three components: approximation, estimation and optimization errors. While all three have been extensively studied in the literature, a unified treatment is less frequent, in part because of conflicting assumptions. Current approximation results rely on carefully hand crafted weights or practically unavailable information, which are difficult to achieve by gradient descent. Optimization theory is best understood in over-parametrized regimes with more weights than samples, while classical estimation errors require the opposite regime with more samples than weights.

This paper contains two results which bound all three error components simultaneously for (non-convex) training of the second but last layer of deep fully connected networks on the unit sphere. The first uses a regular least squares loss and shows convergence in the under-parametrized regime. The second uses a kernel based loss function and shows convergence in both under and over-parametrized regimes.

URL: https://openreview.net/forum?id=dzND5haNvA

---

Title: Model Tensor Planning

Authors: An Thai Le, Khai Nguyen, Minh Nhat VU, Joao Carvalho, Jan Peters

Abstract: Sampling-based model predictive control (MPC) offers strong performance in nonlinear and contact-rich robotic tasks, yet often suffers from poor exploration due to locally greedy sampling schemes. We propose \emph{Model Tensor Planning} (MTP), a novel sampling-based MPC framework that introduces high-entropy control trajectory generation through structured tensor sampling. By sampling over randomized multipartite graphs and interpolating control trajectories with B-splines and Akima splines, MTP ensures smooth and globally diverse control candidates. We further propose a simple $\beta$-mixing strategy that blends local exploitative and global exploratory samples within the modified Cross-Entropy Method (CEM) update, balancing control refinement and exploration. Theoretically, we show that MTP achieves asymptotic path coverage and maximum entropy in the control trajectory space in the limit of infinite tensor depth and width.

Our implementation is fully vectorized using JAX and compatible with MuJoCo XLA, supporting \emph{Just-in-time} (JIT) compilation and batched rollouts for real-time control with online domain randomization. Through experiments on various challenging robotic tasks, ranging from dexterous in-hand manipulation to humanoid locomotion, we demonstrate that MTP outperforms standard MPC and evolutionary strategy baselines in task success and control robustness. Design and sensitivity ablations confirm the effectiveness of MTP’s tensor sampling structure, spline interpolation choices, and mixing strategy. Altogether, MTP offers a scalable framework for robust exploration in model-based planning and control.

URL: https://openreview.net/forum?id=fk1ZZdXCE3

---

Title: HDCS: Hierarchy Discovery and Critic Shaping for Reinforcement Learning with Automaton Specification

Authors: Duo XU, Faramarz Fekri

Abstract: Training reinforcement learning (RL) agents by scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Deterministic finite-state automaton (DFA) provides a streamlined method for specifying tasks in reinforcement learning (RL) that surpass the limitations of traditional discounted return formulations. However, existing RL algorithms designed to address DFA tasks face unresolved challenges, hindering their practical application. One key issue is that subgoals in the DFA may exhibit hidden hierarchical structures, with some macro-subgoals comprising multiple micro-subgoals in certain orders. Without understanding this hierarchy, RL algorithms may struggle to efficiently solve tasks involving such macro-subgoals. Additionally, the sparse reward problem remains inadequately addressed. Previous approaches, such as potential-based reward shaping, often encounter inefficiencies or result in suboptimal solutions.
To address these challenges, we propose a novel RL framework designed to uncover the hierarchical structure of subgoals and accelerating the solving of DFA tasks without changing the original optimal policies, short as HDCS. The framework operates in two phases: first, a hierarchical RL method is used to identify the prerequisites of subgoals and build the hierarchy; second, given any task specification (DFA), the subgoal hierarchy is incorporated into task DFA to make a product DFA, and then a simple and novel critic shaping approach is proposed to accelerate the satisfaction of product DFA, which does not change optimal policies of the original problem. The effectiveness of HDCS is demonstrated through experiments conducted across various domains. Especially, compared with representative baselines, the critic shaping can have 2X or 3X acceleration on task solving.

URL: https://openreview.net/forum?id=BGoRme2MfG

---

New submissions
===============

Title: Inherently Robust Control through Maximum-Entropy Learning-Based Rollout

Abstract: Reinforcement Learning has recently proven extremely successful in the context of robot control. One of the major reasons is massively parallel simulation in conjunction with controlling for the so-called ``sim to real'' gap: training on a distribution of environments, which is assumed to contain the real one, is sufficient for finding neural policies that successfully transfer from computer simulations to real robots. Often, this is accompanied by a layer of system identification during deployment to close the gap further. Still, the efficacy of these approaches hinges on reasonable simulation capabilities with an adequately rich task distribution containing the real environment. This work aims to provide a complementary solution in cases where the aforementioned criteria may prove challenging to satisfy. We combine two approaches, $\textit{maximum-entropy reinforcement learning}$ (MaxEntRL) and $\textit{rollout}$, into an inherently robust control method called $\textbf{Maximum-Entropy Learning-Based Rollout (MELRO)}$. Both promise increased robustness and adaptability on their own. While MaxEntRL has been shown to be an adversarially-robust approach in disguise, rollout greatly improves over parametric models through an implicit Newton step on a model of the environment. We find that our approach works excellently in the vast majority of cases on both the Real World Reinforcement Learning (RWRL) benchmark and on our own environment perturbations of the popular DeepMind Control (DMC) suite, which move beyond simple parametric noise. We also show its success in ``sim to real'' transfer with the Franka Panda robot arm.

URL: https://openreview.net/forum?id=Ho4XUDn21D

---

Title: An Operator Analysis Approach on Stochastic Differential Equations (SDEs)-Based Diffusion Generative Models

Abstract: Score-based generative models, grounded in SDEs, excel in producing high-quality data but suffer from slow sampling due to the extensive nonlinear computations required for iterative score function evaluations. We propose an innovative approach that integrates score-based reverse SDEs with kernel methods, leveraging the derivative reproducing property of reproducing kernel Hilbert spaces (RKHS) to efficiently approximate the eigenfunctions and eigenvalues of the Fokker-Planck operator. This enables data generation through linear combinations of eigenfunctions, transforming computationally intensive nonlinear operations into efficient linear ones, thereby significantly reducing computational overhead. Notably, our experimental results demonstrate remarkable progress: despite a slight reduction in sample diversity, the sampling time for a single image on the CIFAR-10 dataset is reduced to an impressive 0.29 seconds, marking a substantial advancement in efficiency. This work introduces novel theoretical and practical tools for generative modeling, establishing a robust foundation for real-time applications.

URL: https://openreview.net/forum?id=Iocj6fTd6O

---

Title: Harmonic Loss Trains Interpretable AI Models

Abstract: In this paper, we introduce harmonic loss as an alternative supervisory signal for training neural networks and large language models (LLMs). Harmonic loss differs from standard cross-entropy loss by (a) replacing the usual SoftMax normalization with a scale-invariant HarMax function and (b) computing logits via Euclidean distance rather than a dot product. Harmonic loss enables improved interpretability and faster convergence, owing to its scale invariance and finite convergence point by design, which can be interpreted as a class center. We first validate the performance of harmonic models across algorithmic, vision, and language datasets. Through extensive experiments, we demonstrate that models trained with harmonic loss perform better than standard models by: (a) enhancing interpretability (i.e. geometry of representations), (b) requiring less data for generalization, and (c) reducing grokking. Moreover, we compare a GPT-2 model trained with harmonic loss to the standard GPT-2, illustrating that the harmonic model develops more interpretable representations. We hope our work will inspire future research exploring various methods to improve the geometry of representations, paving the way toward building more interpretable AI models.

URL: https://openreview.net/forum?id=ZpSZ7pNoCs

---

Title: A Case for Library-Level $k$-Means Binning in Histogram Gradient-Boosted Trees

Abstract: Modern Gradient Boosted Decision Trees (GBDTs) accelerate split finding with histogram-based binning, which reduces complexity from $O(N\log N)$ to $O(N)$ by aggregating gradients into fixed-size bins. However, the predominant quantile binning strategy—designed to distribute data points evenly among bins—may overlook critical boundary values that could enhance predictive performance. In this work, we consider a novel approach that replaces quantile binning with a $k$-means discretizer initialized with quantile bins, and justify the swap with a proof showing how, for any $L$-Lipschitz function, k-means maximizes the worst-case explained variance of Y obtained when treating each bin as an atomic unit. We test this swap against quantile and uniform binning on 33 OpenML datasets plus synthetics that control for modality, skew, and bin budget. Across 18 regression datasets, k-means shows no statistically significant losses at the 5% level and wins in four cases—most strikingly a 55% MSE drop on one particularly skewed dataset—even though k-means' mean reciprocal rank (MRR) is slightly lower (0.65 vs 0.72). On the 15 classification datasets the two methods are statistically tied (MRR 0.70 vs 0.68) with gaps $\leq$0.2 pp. Synthetic experiments confirm consistently large MSE gains—typically $>$20% and rising to 90% as outlier magnitude increases or bin budget drops. We find that k-means keeps error on par with exhaustive (no-binning) splitting when extra cuts add little value, yet still recovers key split points that quantile overlooks. As such, we advocate for a built-in bin_method=$k$-means flag, especially in regression tasks and in tight-budget settings such as the 32–64-bin GPU regime—because it is a "safe default" with large upside, yet adds only a one-off, cacheable overhead ($\approx$ 3.5s to bin 10M rows on one Apple M1 thread).

URL: https://openreview.net/forum?id=UaTrLLspJa

---

Title: The Performance Of The Unadjusted Langevin Algorithm Without Smoothness Assumptions

Abstract: In this article, we study the problem of sampling from distributions whose densities are not necessarily smooth nor logconcave. We propose a simple Langevin-based algorithm that does not rely on popular but computationally challenging techniques, such as the Moreau-Yosida envelope or Gaussian smoothing, and show consequently that the performance of samplers like ULA does not necessarily degenerate arbitrarily with low regularity. In particular, we show that the Lipschitz or Hölder continuity assumption can be replaced by a geometric one-sided Lipschitz condition that allows even for discontinuous log-gradients. We derive non-asymptotic guarantees for the convergence of the algorithm to the target distribution in Wasserstein distances. Non-asymptotic bounds are also provided for the performance of the algorithm as an optimizer, specifically for the solution of associated excess risk optimization problems.

URL: https://openreview.net/forum?id=TTNeuyYdhg

---

Title: Convergence Aspects of Hybrid Kernel SVGD

Abstract: Stein variational gradient descent (SVGD) is a particle based approximate inference algorithm. Many variants of SVGD have been proposed in recent years, including the hybrid kernel variant (h-SVGD), which has demonstrated promising results on image classification with deep neural network ensembles. By framing h-SVGD as a kernelised Wasserstein gradient flow on a functional that is not the Kullback-Leibler divergence, we demonstrate that h-SVGD does not converge to the target distribution in the mean field limit. Despite this theoretical result, we provide intuition and experimental support for the ability of h-SVGD to improve variance estimation in high dimensions. Unlike other SVGD variants that also alleviate variance collapse, this is achieved at no additional computational cost and without further assumptions on the posterior.

URL: https://openreview.net/forum?id=JZkbMSQDmD

---

Title: Networked Communication for Decentralised Agents in Mean-Field Games

Abstract: We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic run of the empirical system. We prove that our architecture has sample guarantees bounded between those of the centralised- and independent-learning cases. We provide the order of the difference in these bounds in terms of network structure and number of communication rounds, and also contribute a policy-update stability guarantee. We discuss how the sample guarantees of the three theoretical algorithms do not actually result in practical convergence times. We thus contribute practical enhancements to all three algorithms allowing us to present their first empirical demonstrations, where we do not need to enforce several of the theoretically required assumptions. We then show that in practical settings where the theoretical hyperparameters are not observed (leading to poor estimation of the Q-function), our communication scheme considerably accelerates learning over the independent case, which hardly seems to learn at all. Indeed networked agents often perform similarly to the centralised case, while removing the restrictive assumption of the latter. We provide ablations and additional studies showing that our networked approach also has advantages over both alternatives in terms of robustness to update failures and to changes in population size.

URL: https://openreview.net/forum?id=J9WGHU78gb

---

Title: Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Abstract: Understanding the locus of semantic representation in large language models (LLMs) is crucial for interpretability and architectural innovation. The dominant paradigm posits that trainable input embeddings serve as foundational "meaning vectors." This paper challenges that view. We construct Transformer models where the embedding layer is entirely frozen, with vectors derived not from data, but from the visual structure of Unicode glyphs. These non-semantic, precomputed visual embeddings are fixed throughout training. Our method is compatible with any tokenizer, including a novel Unicode-centric tokenizer we introduce to ensure universal text coverage. Despite the absence of trainable, semantically initialized embeddings, our models converge, generate coherent text, and, critically, outperform architecturally identical models with trainable embeddings on the MMLU reasoning benchmark. We attribute this to "representational interference" in conventional models, where the embedding layer is burdened with learning both structural and semantic features. Our results indicate that high-level semantics are not inherent to input embeddings but are an emergent property of the Transformer's compositional architecture and data scale. This reframes the role of embeddings from meaning containers to structural primitives. We release all code and models to foster further research.

URL: https://openreview.net/forum?id=Odh8IynO1o

---

Title: Multimodal Deception in Explainable AI: Concept-Level Backdoor Attacks on Concept Bottleneck Models

Abstract: Deep learning has demonstrated transformative potential across domains, yet its inherent opacity has driven the development of Explainable Artificial Intelligence (XAI). Concept Bottleneck Models (CBMs), which enforce interpretability through human-understandable concepts, represent a prominent advancement in XAI. However, despite their semantic transparency, CBMs remain vulnerable to security threats such as backdoor attacks—malicious manipulations that induce controlled misbehaviors during inference. While CBMs leverage multimodal representations (visual inputs and textual concepts) to enhance interpretability, heir dual-modality structure introduces new attack surfaces. To address the unexplored risk of concept-level backdoor attacks in multimodal XAI systems, we propose CAT (Concept-level Backdoor ATtacks), a methodology that injects triggers into conceptual representations during training, enabling precise prediction manipulation without compromising clean-data performance. An enhanced variant, CAT+, incorporates a concept correlation function to systematically optimize trigger-concept associations, thereby improving attack effectiveness and stealthiness. Through a comprehensive evaluation framework assessing attack success rate, stealth metrics, and model utility preservation, we demonstrate that CAT and CAT+ maintain high performance on clean data while achieving significant targeted effects on backdoored datasets. This work highlights critical security risks in interpretable AI systems and provides a robust methodology for future security assessments of CBMs.

URL: https://openreview.net/forum?id=N8CTyY5FbR

---

Title: Accumulator-Aware Post-Training Quantization for Large Language Models

Abstract: When quantizing weights and activations to increasingly narrower representations, the cost of additions begins to dominate that of multiplications in multiply-accumulate (MAC) units. Recent studies show that reducing addition costs via low-precision accumulation improves throughput, power, and area across inference platforms, albeit with an increased risk of overflow. Accumulator-aware quantization research has so far only considered the quantization-aware training (QAT) paradigm, in which models are fine-tuned or trained from scratch with quantization in the loop. As models and datasets continue to grow in size, QAT techniques become increasingly more expensive, which has motivated the recent surge in post-training quantization (PTQ) research. To bridge this gap, we introduce AXE—the first accumulator-aware quantization framework explicitly designed to endow overflow avoidance guarantees to PTQ algorithms. We present theoretical motivation for AXE and demonstrate its flexibility by implementing it on top of two existing algorithms: GPFQ and OPTQ. We design AXE to support multi-stage accumulation, opening the door to full datapath optimization for the first time. We evaluate AXE using recent language generation models; when quantizing Llama3 8B for a 16-bit multi-stage accumulation datapath, AXE maintains up to 98% of the FP16 perplexity, surpassing naïve bit width manipulation by up to 15%.

URL: https://openreview.net/forum?id=p6l0579yj7

---

Title: MTMT: Tiered Treatment Effect Decomposition for Multi-Task Uplift Modeling

Abstract: As a key component in boosting online user growth, uplift modeling aims to measure individual user responses (e.g., whether to play the game) to various treatments, such as gaming bonuses, thereby enhancing business outcomes. However, previous research typically considers a single-task, single-treatment setting, where only one treatment exists and the overall treatment effect is measured by a single type of user response. In this paper, we propose a Multi-Treatment Multi-Task (MTMT) uplift network to estimate treatment effects in a multi-task scenario. We identify the multi-treatment problem as a causal inference problem with a tiered response, comprising a base effect (from offering a treatment) and an incremental effect (from offering a specific type of treatment), where the base effect can be numerically much larger than the incremental effect. Specifically, MTMT separately encodes user features and treatments. The user feature encoder uses a multi-gate mixture of experts (MMOE) network to encode relevant user features, explicitly learning inter-task relations. The resultant embeddings are used to measure natural responses per task. Furthermore, we introduce a user-treatment feature interaction module to model correlations between each treatment and user feature. Consequently, we separately measure the base and incremental treatment effect for each task based on the produced treatment-aware representations. Experimental results based on an offline public dataset and an online proprietary dataset demonstrate the effectiveness of MTMT in single/multi-treatment and single/multi-task settings. Additionally, MTMT has been deployed in our gaming platform to improve user experience.

URL: https://openreview.net/forum?id=yo9NTZsGMa

---

Title: Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment

Abstract: Traditional continual learning methods prioritize knowledge retention and focus primarily on mitigating catastrophic forgetting, implicitly assuming that the data distribution of previously learned tasks remains static. This overlooks the dynamic nature of real-world data streams, where concept drift permanently alters previously seen data and demands both stability and rapid adaptation. We introduce a holistic framework for continual learning under concept drift that simulates realistic scenarios by evolving task distributions. As a baseline, we consider Full Relearning (FR), in which the model is retrained from scratch on newly labeled samples from the drifted distribution. While effective, this approach incurs substantial annotation and computational overhead. To address these limitations, we propose Adaptive Memory Realignment (AMR), a lightweight alternative that equips rehearsal-based learners with a drift-aware adaptation mechanism. AMR selectively removes outdated samples of drifted classes from the replay buffer and repopulates it with a small number of up-to-date instances, effectively realigning memory with the new distribution. This targeted resampling matches the performance of FR while reducing the need for labeled data and computation by orders of magnitude. To enable reproducible evaluation, we introduce four concept drift variants of standard vision benchmarks: Fashion-MNIST-CD, CIFAR10-CD, CIFAR100-CD, and Tiny-ImageNet-CD, where previously seen classes reappear with shifted representations. Comprehensive experiments on these datasets using several rehearsal-based baselines show that AMR consistently counters concept drift, maintaining high accuracy with minimal overhead. These results position AMR as a scalable solution that reconciles stability and plasticity in non-stationary continual learning environments. Full implementation of our framework and concept drift benchmark data sets are available at https://anonymous.4open.science/r/CL-Under-Concept-Drift-8380/README.md.

URL: https://openreview.net/forum?id=1drDlt0CLM

---

Title: On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Abstract: Since its introduction, softmax attention has become the backbone of modern transformer architectures due to its expressiveness and scalability across a wide range of tasks. However, the main drawback of softmax attention is the quadratic memory requirement and computational complexity with respect to the sequence length. By replacing the softmax nonlinearity, linear attention and similar methods have been introduced to avoid the quadratic bottleneck of softmax attention. Despite these linear forms of attention being derived from the original softmax formulation, they typically lag in terms of downstream accuracy. While strong intuition of the softmax nonlinearity on the query and key inner product suggests that it has desirable properties compared to other nonlinearities, the question of why this discrepancy exists still remains unanswered. This work demonstrates that linear attention is an approximation of softmax attention by deriving the recurrent form of softmax attention. Using this form, each part of softmax attention can be described in the language of recurrent neural networks (RNNs). Describing softmax attention as an RNN allows for the ablation of the components of softmax attention to understand the importance of each part and how they interact. In this way, our work helps explain why softmax attention is more expressive than its counterparts.

URL: https://openreview.net/forum?id=PHcITOi3vV

---

Title: Unveiling Transfer Learning Effectiveness Through Latent Feature Distributions

Abstract: Transfer learning leverages large-scale pretraining to adapt models to specific downstream
tasks. It has emerged as a powerful and widely adopted training strategy in deep learning
frameworks. So, what makes it effective? Prior research has attributed its success to feature
reuse, pretrained weights reuse, domain alignment, and the transfer of low-level data statistics.
This study goes beyond these perspectives and focuses on a more fundamental factor:
the evolution of logits distribution within the latent feature space of pretrained models. We
introduce a novel approach using the Wasserstein distance to track distributional changes in
the latent features. We find that pretraining not only learns the input distributions but also
transforms them into generalizable internal representations in a consistent manner across
all frozen layers. This finding underpins the effectiveness of transfer learning and provides
a unifying explanation for these established theoretical perspectives.

URL: https://openreview.net/forum?id=2oqUNlfRmB

---

Title: Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression

Abstract: Automatic Speech Recognition (ASR) models are deployed in an extensive range of applications. However, recent studies have demonstrated the possibility of adversarial attack on these models which could potentially suppress or disrupt model output. We investigate and verify the robustness of these attacks and explore if it is possible to increase their imperceptibility. We additionally find that by relaxing the optimisation objective from complete suppression to partial suppression, we can further decrease the imperceptibility of the attack. We also explore possible defences against these attacks and show a low-pass filter defence could potentially serve as an effective defence.

URL: https://openreview.net/forum?id=ND0kU1NQWG

---

Title: Beyond Accuracy: What Matters in Designing Well-Behaved Image Classification Models?

Abstract: Deep learning has become an essential part of computer vision, with deep neural networks (DNNs) excelling in predictive performance. However, they often fall short in other critical quality dimensions, such as robustness, calibration, or fairness. While existing studies have focused on a subset of these quality dimensions, none have explored a more general form of “well-behavedness” of DNNs. With this work, we address this gap by simultaneously studying nine different quality dimensions for image classification. Through a large-scale study, we provide a bird's-eye view by analyzing 326 backbone models and how different training paradigms and model architectures affect these quality dimensions. We reveal various new insights such that (i) vision-language models exhibit high class balance on ImageNet-1k classification and strong robustness against domain changes; (ii) self-supervised learning is an effective training paradigm to improve almost all considered quality dimensions; and (iii) the training dataset size is a major driver for most of the quality dimensions. We conclude our study by introducing the QUBA score (Quality Understanding Beyond Accuracy), a novel metric that ranks models across multiple dimensions of quality, enabling tailored recommendations based on specific user needs.

URL: https://openreview.net/forum?id=E7HDtLCoT6

---

Title: Less Can Be More: Rethinking Message-Passing for Algorithmic Alignment on Graphs

Abstract: Most Graph Neural Networks are based on the principle of message-passing, where all neighboring nodes exchange messages with each other simultaneously. We introduce the Flood and Echo Net, a novel architecture that aligns neural computation with the principles of distributed algorithms directly on the level of message-passing. In our method, nodes sparsely activate upon receiving a message, leading to a wave-like activation pattern that traverses the entire graph. Through these sparse but parallel activations, the Net becomes provably more efficient in terms of message complexity. Moreover, the mechanism's structure to generalize across graphs of varying sizes positions it as a practical architecture for the task of graph algorithmic reasoning. We empirically validate the Flood and Echo Net improves generalization to larger graph sizes, including the SALSA-CLRS benchmark, improving graph accuracy for instances 100 times larger than during training.

URL: https://openreview.net/forum?id=Sy8ut7HV6b

---

Title: PersonalizedRouter: Personalized LLM Routing via Graph-based User Preference Modeling

Abstract: The growing number of Large Language Models (LLMs) with diverse capabilities and response styles provides users with a wider range of choices, which presents challenges in selecting appropriate LLMs, as user preferences vary in terms of performance, cost, and response style. Current LLM selection methods typically optimize for a single fixed objective, such as performance, cost, or a trade-off between them, and fail to learn user preferences from interaction data. To address these limitations in supporting users, we propose PersonalizedRouter, a graph-based framework that models diverse user profiles and performs personalized LLM selection by leveraging interaction data that includes task context, queries, candidate LLMs, and user decisions. To capture contextual information between user queries and optimal LLMs, PersonalizedRouter converts the interaction data into a heterogeneous graph, where the relationships between different types of nodes are represented by edges. To further assess the adaptability for multiple users, we design two strategies to simulate different user interaction data: the multi-cost-efficiency simulation strategy and the LLM-as-a-Judge strategy. The experimental results from two simulation settings demonstrate that our PersonalizedRouter outperforms existing LLM selection methods and surpasses the strongest methods by a large margin of 16.97% and 9.83%. In a larger-scale setting with more users and LLMs, it achieves at least 49.26% time cost reduction while outperforming all baselines and maintaining superior robustness. Moreover, PersonalizedRouter exhibits few-shot learning capabilities, effectively adapting to new users and new LLMs, achieving 64.81% and 85.80% of the fully trained model’s performance, respectively.

URL: https://openreview.net/forum?id=W80eE3ArAl

---

Title: Encoder-only Next Token Prediction

Abstract: Next-token prediction is conventionally done using decoder-only Transformers with causal attention, as this approach allows for efficient reuse of keys and values. What if we were not compute-limited, should we still use decoder-only Transformers? In this work, we introduce Encoder-only Next Token Prediction (ENTP). We explore the differences between ENTP and decoder-only Transformers in expressive power and complexity, highlighting potential advantages of ENTP in settings with unbounded compute. We introduce the $\operatorname{Count3}$ task and show, both theoretically and experimentally, that while ENTP can perform this task easily, a decoder-only Transformer cannot. Finally, we empirically demonstrate the superior performance of ENTP across representative tasks where next-token prediction based Transformers can be evaluated, including addition, in-context learning, and language modeling.

URL: https://openreview.net/forum?id=CGHi289y8e

---

Title: Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

Abstract: We study the Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing $\ell_1$‐regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an $\ell_2$ penalty for robustness. We solve the resulting non-convex minimax problem via an alternating best response algorithm with two subproblems: the $\alpha$‐subproblem is a standard kernel SVM dual solved via LIBSVM, while the $\beta$‐subproblem admits an efficient solution via the Greedy Selector and Simplex Projector algorithm. We reformulate SMKL as a mixed integer semidefinite optimization problem and derive a hierarchy of semidefinite convex relaxations which can be used to certify near-optimality of the solutions returned by our best response algorithm and also to warm start it. On ten UCI benchmarks, our method with random initialization outperforms state-of-the-art MKL approaches in out of sample prediction accuracy on average by $3.34$ percentage points (relative to the best performing benchmark) while selecting a small number of candidate kernels in comparable runtime. With warm starting, our method outperforms the best performing benchmark's out of sample prediction accuracy on average by $4.05$ percentage points. Our convex relaxations provide a certificate that in several cases, the solution returned by our best response algorithm is the globally optimal solution.

URL: https://openreview.net/forum?id=Y5icwFwkyh

---

Title: Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs

Abstract: An increasing number of NLP applications interact with large language models (LLMs) through black-box APIs, making prompt engineering critical for controlling model outputs. While recent Automatic Prompt Optimization (APO) methods iteratively refine prompts using model-generated feedback, \textit{textual gradients}, they primarily focus on error correction and neglect valuable insights from correct predictions. This limits both their effectiveness and efficiency. In this paper, we propose a novel APO framework centered on enhancing the feedback mechanism. We reinterpret the textual gradient as a form of negative reinforcement and introduce the complementary positive reinforcement to explicitly preserve beneficial prompt components identified through successful predictions. To mitigate the noise inherent in LLM-generated feedback, we introduce a technique called feedback diversification, which aggregates multiple feedback signals, emphasizing consistent, actionable advice while filtering out outliers. Motivated by the rapid evolution and diversity of available LLMs, we also formalize Continual Prompt Optimization (CPO), addressing the practical challenge of efficiently migrating optimized prompts between different model versions or API providers. Our experiments reveal that naive prompt migration often degrades performance due to loss of critical instructions. In contrast, our approach consistently outperforms strong baselines, achieving significant accuracy improvements, faster convergence, and lower computational costs in both standard and migration scenarios.

URL: https://openreview.net/forum?id=1IgBOgImqE

---

Title: Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Abstract: The prevailing paradigm for scaling large language models (LLMs) involves monolithic, end-to-end training, a resource-intensive process that lacks flexibility. This paper explores an alternative, constructive approach to model development, built upon the foundation of non-trainable, deterministic input embeddings. Building upon the recent finding that high-level semantic reasoning can emerge in Transformers using frozen embeddings derived from the visual structure of Unicode glyphs, we demonstrate that this fixed representational substrate acts as a universal "docking port," enabling two powerful and efficient scaling paradigms: seamless modular composition and progressive layer-wise growth. First, we show that specialist models trained on disparate datasets (e.g., Russian and Chinese text) can be merged into a single, more capable Mixture-of-Experts (MoE) model, post-training, with zero architectural modification. This is achieved by simply averaging their output logits. The resulting MoE model exhibits immediate performance improvements on reasoning benchmarks like MMLU, surpassing its constituent experts without catastrophic forgetting. Second, we introduce a layer-wise constructive training methodology, where a deep Transformer is "grown" by progressively stacking and training one layer at a time. This method demonstrates stable convergence and a clear correlation between model depth and the emergence of complex reasoning abilities, such as those required for SQuADv2. Our findings suggest a paradigm shift from monolithic optimization towards a more biological or constructive model of AI development, where complexity is built incrementally and modules can be composed freely. This opens new avenues for resource-efficient scaling, continual learning, and a more democratized ecosystem for building powerful AI systems. We release all code and models to facilitate further research.

URL: https://openreview.net/forum?id=gSdftmJelp

---

Title: MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis

Abstract: Recent advances in artificial intelligence (AI) have precipitated significant breakthroughs in healthcare, particularly in the refinement of diagnostic procedures. However, previous studies have often been limited to limited functionalities. This study introduces MiniGPT-Med, a vision-language model derived from large-scale language models and tailored for medical applications. MiniGPT-Med demonstrates remarkable versatility across various imaging modalities, including X-rays, CT scans, and MRIs, enhancing its utility. The model is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. Its integrated processing of both image and textual clinical data markedly improves diagnostic accuracy. Our empirical assessments confirm the superior performance of MiniGPT-Med in disease detection, medical report generation, and VQA benchmarks, representing a significant step towards reducing the gap in assisting radiology practice. Furthermore, it achieves state-of-the-art performance on medical report generation, higher than the previous best model by 19\% accuracy. MiniGPT-Med promises to become a general interface for radiology diagnoses, enhancing diagnostic efficiency across a wide range of medical imaging applications.

URL: https://openreview.net/forum?id=NenHFEg1Di

---

Title: Program Semantic Inequivalence Game with Large Language Models

Abstract: Large Language Models (LLMs) can achieve strong performance on everyday coding tasks, but they can fail on complex tasks that require non-trivial reasoning about program semantics.
Finding training examples to teach LLMs to solve these tasks can be challenging.

In this work, we explore a method to synthetically generate code reasoning training data based on a semantic inequivalence game (SInQ): a generator agent creates program variants that are semantically distinct, derived from a dataset of real-world programming tasks, while an evaluator agent has to identify input examples for which they behave differently. The agents train each other semi-adversarially, improving their ability to understand the underlying logic of code.

We evaluated our approach on multiple code generation and understanding benchmarks, including cross-language vulnerability detection (Lu et al., 2021), where our method improves vulnerability detection in C/C++ code despite being trained exclusively on Python code, and the challenging Python builtin identifier swap benchmark (Miceli Barone et al., 2023), showing that whereas modern LLMs still struggle with this benchmark, our approach yields substantial improvements.

We release the code needed to replicate the experiments, as well as the generated synthetic data, which can be used to fine-tune LLMs.

URL: https://openreview.net/forum?id=AdvuYWiyaX

---

Title: Federated Multimodal Fusion for Action Recognition Leveraging Vision-Language Embeddings and Spatio- Temporal CNNs

Abstract: Federated learning (FL) for Video Action Recognition (VAR) faces significant challenges in balancing privacy preservation, communication efficiency, and model performance. This paper introduces FLAMeST (Federated Learning for Action Recognition with Multimodal embeddings and Spacio-Temporal Fusion), a FL framework that synergizes Vision-Language Models (VLMs) and spatiotemporal CNNs to address these challenges. Unlike existing works that use BLIP (VLM) solely for caption generation, FLAMeST leverages BLIP in a dual manner. To enhance temporal modeling, complementary spatiotemporal features are extracted using a pre-trained 3D CNN (Slow network). These semantic (BLIP) and motion (Slow) embeddings are concatenated into a unified representation to train a lightweight Multi-Layer Perceptron (MLP). Within the FL paradigm, only the MLP parameters are shared with the server, ensuring raw video data and generated captions remain local. FLAMeST employs the FedAvg algorithm for model aggregation, achieving 99%(↓) lower communication overhead compared to full-model training. Experiments on UCF101 and HMDB51 datasets demonstrate the framework’s robustness, achieving improved accuracies of 5.13%(↑) and 2.71%(↑), respectively, against the baseline.

URL: https://openreview.net/forum?id=AobzdtqiMe

---

Title: Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction

Abstract: Text-video prediction (TVP) is a downstream video generation task that requires a model to produce subsequent video frames given a series of initial video frames and text describing the required motion.
In practice TVP methods focus on a particular category of videos depicting manipulations of objects carried out by human beings or robot arms.
Previous methods adapt models pre-trained on text-to-image tasks, and thus tend to generate video that lacks the required continuity.
A natural progression would be to leverage more recent pre-trained text-to-video (T2V) models.
This approach is rendered more challenging by the fact that the most common fine-tuning technique, low-rank adaptation (LoRA), yields undesirable results.
In this work, we propose an adaptation-based strategy we label Frame-wise Conditioning Adaptation (FCA).
Within the module, we devise a sub-module that produces frame-wise text embeddings from the input text, which acts as an additional text condition to aid generation.
We use FCA to fine-tune the T2V model, which incorporates the initial frame(s) as an extra condition.
We compare and discuss the more effective strategy for injecting such embeddings into the T2V model.
We conduct extensive ablation studies on our design choices with quantitative and qualitative performance analysis.
Our approach establishes a new baseline for the task of TVP.

URL: https://openreview.net/forum?id=HSAjl4LUHK

---

Title: Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop

Abstract: Large Language Models (LLMs) are already widely used to generate content for a variety of online platforms. As we are not able to safely distinguish LLM-generated content from human-produced content, LLM-generated content is used to train the next generation of LLMs, giving rise to a self-consuming training loop. From the image generation domain we know that such a self-consuming training loop reduces both quality and diversity of images finally ending in a model collapse. However, it is unclear whether this alarming effect can also be observed for LLMs. Therefore, we present the first study investigating the self-consuming training loop for LLMs. Further, we propose a novel method based on logic expressions that allows us to unambiguously verify the correctness of LLM-generated content, which is difficult for natural language text. We find that the self-consuming training loop produces correct outputs, however, the output declines in its diversity depending on the proportion of the used generated data. Fresh data can slow down this decline, but not stop it. Further, we observe similar results on a real natural language dataset. Given these concerning results, we encourage researchers to study methods to negate this process.

URL: https://openreview.net/forum?id=FzIzju42B3

---

Title: Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods

Abstract: Despite their theoretical appeal, totally corrective boosting methods based on linear programming have received limited empirical attention. In this paper, we conduct the first large-scale experimental study of six LP-based boosting formulations, including two novel methods, NM-Boost and QRLP-Boost, across 20 diverse datasets. We evaluate the use of both heuristic and optimal base learners within these formulations, and analyze not only accuracy, but also ensemble sparsity, margin distribution, anytime performance, and hyperparameter sensitivity. We show that totally corrective methods can outperform or match state-of-the-art heuristics like XGBoost and LightGBM when using shallow trees, while producing significantly sparser ensembles. We further show that these methods can thin pre-trained ensembles without sacrificing performance, and we highlight both the strengths and limitations of using optimal decision trees in this context.

URL: https://openreview.net/forum?id=lscC4PZUE4

---

Title: Context-Aware Clustering using Large Language Models

Abstract: Despite the remarkable success of Large Language Models (LLMs) in text understanding and generation, their potential for text clustering tasks remains underexplored. While we observed powerful closed-source LLMs generate high-quality text clusterings, their massive size and inference cost make them impractical for repeated online use in real-world applications. Motivated by this limitation, we study the transfer of clustering knowledge from LLMs to smaller and more efficient open-source language models (SLMs), aiming to retain performance while improving scalability. We propose CACTUS (Context-Aware ClusTering with aUgmented triplet losS), a systematic approach that leverages SLMs for efficient and effective supervised clustering of entity subsets, particularly focusing on text-based entities. Existing text clustering methods fail to capture the context provided by the entity subset. In particular, they typically embed each entity independently, ignoring the mutual relationships among entities within the same subset. CACTUS incorporates a scalable inter-entity attention mechanism that efficiently models pairwise interactions to capture this context. Although several language modeling-based approaches exist for clustering, very few are designed for the task of supervised clustering. We propose a new augmented triplet loss function tailored for supervised clustering, which addresses the inherent challenges of directly applying the standard triplet loss to this problem by introducing a neutral similarity anchor. Furthermore, we introduce a self-supervised clustering pretraining task based on text augmentation techniques to improve the generalization of our model. Extensive experiments on various e-commerce query and product clustering datasets demonstrate that our proposed approach significantly outperforms existing unsupervised and supervised baselines across multiple external clustering evaluation metrics. Our results establish CACTUS as a scalable, generalizable solution for real-world clustering scenarios. Our code is publicly available at https://anonymous.4open.science/r/context-aware-clustering-E90C.

URL: https://openreview.net/forum?id=b4xx3ylyz9

---

Title: Formal Methods in Robot Policy Learning: A Survey on Current Techniques and Future Directions

Abstract: As hardware and software systems have grown in complexity, formal methods have been indispensable tools for (1) rigorously specifying acceptable behaviors, (2) synthesizing programs to meet these specifications, and (3) validating the correctness of existing programs. In the field of robotics, a similar trend of rising complexity has emerged, driven in large part by the adoption of deep learning. While this shift has enabled the development of highly performant robot policies, their implementation as deep neural networks has posed challenges to traditional formal analysis, leading to models that are inflexible, fragile, and difficult to interpret. In response, the robotics community has introduced new formal and semi-formal methods to support the precise specification of complex objectives, guide the learning process to achieve them, and enable the verification of learned policies against them.
In this survey, we provide a comprehensive overview of how formal methods are integrated into robot policy learning. We organize our discussion around three key pillars: specification, synthesis, and verification of learned policies. For each, we highlight representative techniques, compare their scalability and expressiveness, and summarize how they contribute to meaningfully improving realistic robot safety and correctness. We conclude with a discussion of remaining obstacles for achieving that goal and promising directions for advancing formal methods in robot learning.

URL: https://openreview.net/forum?id=DZkikdg5sl

---

Title: LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models

Abstract: Temporal reasoning (TR) is a critical component of artificial intelligence, encompassing understanding and processing temporal information and relationships between events. To discover and study the TR ability in Large Language Models (LLMs), various datasets have been constructed in different ways for evaluating various aspects of TR ability. Our work proposes a novel approach to design and develop a pipeline for constructing datasets to evaluate the TR ability of LLMs by leveraging random directed graph generation, LTL formula, and the NuSMV model checker. Based on the pipeline, we have also constructed a dataset as a benchmark, namely LTLBench, consisting of 2,000 TR challenges and evaluated six LLMs with it. Furthermore, we have conducted additional experiments to discover the impact of increasing the number of events and formula operators on the complexity of TR problems and the performance of LLMs. We have demonstrated that although LLMs exhibit some promise in handling TR challenges, they still struggle with complex TR. We expect this work can offer insights into TR ability in LLMs while also providing a valuable tool for future TR evaluations.

URL: https://openreview.net/forum?id=02QRC2Cuu0

---

Title: Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need

Abstract: Language models traditionally utilized for cross-domain generalization in natural language understanding and generation have recently demonstrated task-specific reasoning through inference-time scaling. However, their top-down training approach on general text corpora is insufficient for acquiring domain-specific abstractions required for deep expertise in a particular domain. This may require a bottom-up approach that acquires deep expertise by explicitly learning to compose simple concepts of a domain into more complex ones. A knowledge graph (KG) provides such an abstraction where domain primitives are captured by head-relation-tail triples. A KG path formed by such triples captures a higher-level concept. We present a task generation pipeline that directly synthesizes tasks from the domain-specific primitives, enabling the model to explicitly acquire and compose these primitives for reasoning. We fine-tune language models on the resultant bottom-up KG-grounded curriculum to demonstrate domain-specific superintelligence.

Although our approach is readily applicable to a wide variety of domains, we validate it in the context of medicine where reliable KGs are available. Applying our proposed pipeline to a medical KG, we curate a dataset of 24,000 high-quality reasoning tasks paired with structured thinking traces derived from diverse medical primitives. We fine-tune the QwQ-32B model on this bottom-up curriculum to obtain QwQ-Med-3 that takes a step towards medical superintelligence. We also introduce an evaluation suite, ICD-Bench, to quantify domain-specific capabilities of models on reasoning tasks across 15 medical domains. Our experiments demonstrate that QwQ-Med-3 significantly outperforms state-of-the-art open-source and proprietary reasoning models on all categories of ICD-Bench. Further analysis reveals that QwQ-Med-3 utilizes acquired primitives to especially widen the performance gap on the hardest tasks in ICD-Bench. Finally, evaluation on external medical question-answer benchmarks shows that QwQ-Med-3 is able to transfer acquired expertise to improve the performance of the base model.

The industry's approach to artificial general intelligence (AGI) centers on breadth of acquired expertise. We envision a future in which a compositional model of AGI emerges from interacting superintelligent agents, much like how the human society hierarchically acquires ever deeper expertise by combining the expertise of a group of individuals in adjacent domains or super-domains. Furthermore, since language models that are fine-tuned for superintelligence can be relatively small (e.g., 32B parameters), this bottom-up approach may also significantly cut down on training/inference energy costs.

URL: https://openreview.net/forum?id=35dgNnb7nz

---

Title: How to Upscale Neural Networks with Scaling Law?

Abstract: Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. However, recent studies highlighted their limitations across architectures, modalities, and deployment contexts. Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns. Moreover, scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches. In this survey, we synthesize insights from current studies, examining the theoretical foundations, empirical findings, and practical implications of scaling laws. We also explore key challenges, including data efficiency, inference scaling, and architecture-specific constraints, advocating for adaptive scaling strategies tailored to real-world applications. We suggest that while scaling laws provide a useful guide, they do not always generalize across all architectures and training strategies.

URL: https://openreview.net/forum?id=AL7N0UOfgI

---

Title: AutoAnnotator: A Collaborative Annotation Framework for Large and Small Language Models

Abstract: Although the annotation paradigm based on Large Language Models (LLMs) has made significant breakthroughs in recent years, its actual deployment still has two core bottlenecks: first, the cost of calling commercial APIs in large-scale annotation is very expensive; second, in scenarios that require fine-grained semantic understanding, such as sentiment classification and toxicity classification, the annotation accuracy of LLMs is even lower than that of Small Language Models (SLMs) dedicated to this field. To address these problems, we propose a new paradigm of multi-model cooperative annotation and design a fully automatic annotation framework AutoAnnotator based on this. Specifically, AutoAnnotator consists of two layers. The upper-level meta-controller layer uses the generation and reasoning capabilities of LLMs to select SLMs for annotation, automatically generate annotation code and verify difficult samples; the lower-level task‑specialist layer consists of multiple SLMs that perform annotation through multi-model voting. In addition, we use the difficult samples obtained by the secondary review of the meta-controller layer as the reinforcement learning set and fine-tune the SLMs in stages through a continual learning strategy, thereby improving the generalization of SLMs. Extensive experiments show that AutoAnnotator outperforms existing open-source/API LLMs in zero-shot, one-shot, CoT, and majority voting settings. Notably, AutoAnnotator reduces the annotation cost by 74.15% compared to directly annotating with GPT-3.5-turbo, while still improving the accuracy by 6.21%.

URL: https://openreview.net/forum?id=LauojtjA9F

---

Title: Language Models for Controllable DNA Sequence Design

Abstract: We consider controllable DNA sequence design, where sequences are generated by conditioning on specific biological properties. While language models (LMs) such as GPT and BERT have achieved remarkable success in natural language generation, their application to DNA sequence generation remains largely underexplored. In this work, we introduce ATGC-Gen, an Automated Transformer Generator for Controllable Generation, which leverages cross-modal encoding to integrate diverse biological signals. ATGC-Gen is instantiated with both decoder-only and encoder-only transformer architectures, allowing flexible training and generation under either autoregressive or masked recovery objectives. We evaluate ATGC-Gen on representative tasks including promoter and enhancer sequence design, and further introduce a new dataset based on ChIP-Seq experiments for modeling protein binding specificity. Our experiments demonstrate that ATGC-Gen can generate fluent, diverse, and biologically relevant sequences aligned with the desired properties. Compared to prior methods, our model achieves notable improvements in controllability and functional relevance, highlighting the potential of language models in advancing programmable genomic design.

URL: https://openreview.net/forum?id=itwnoEu60S

---

Title: Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models

Abstract: The rapid growth of visual tokens in multimodal large language models (MLLMs) leads to excessive memory consumption and inference latency, especially when handling high-resolution images and videos. Token pruning is a technique used to mitigate this issue by removing redundancy, but existing methods often ignore relevance to the user query or suffer from the limitations of attention mechanisms, reducing their adaptability and effectiveness. To address these challenges, we propose Script, a plug-and-play pruning method that requires no retraining and generalizes across diverse MLLMs. Script comprises two modules: a graph-structured pruning module that removes visually redundant tokens, and a query-conditioned semantic pruning module that preserves query-relevant visual information. Together, they enhance performance on multimodal tasks. Experiments on fourteen benchmarks across image and video understanding tasks show that Script consistently achieves higher model efficiency and predictive accuracy compared to existing pruning methods. On LLaVA-NeXT-7B, it achieves up to $6.8\times$ prefill speedup and $10\times$ FLOP reduction, while retaining 96.88\% of the original performance. Code will be made publicly available upon acceptance.

URL: https://openreview.net/forum?id=F6xKzbgcHq

---

Title: DeepSeek-R1 Thoughtology: Let’s think about LLM reasoning

Abstract: Large Reasoning Models like DeepSeek-R1 mark a fundamental shift in how LLMs approach complex problems. Instead of directly producing an answer for a given input, DeepSeek-R1 creates detailed multi-step reasoning chains, seemingly “thinking” about a problem before providing an answer. This reasoning process is publicly available to the user, creating endless opportunities for studying the reasoning behaviour of the model and opening up the field of Thoughtology. Starting from a taxonomy of DeepSeek-R1’s basic building blocks of reasoning, our analyses on DeepSeek-R1 investigate the impact and controllability of thought length, management of long or confusing contexts, cultural and safety concerns, and the status of DeepSeek-R1 vis-à-vis cognitive phenomena, such as human-like language processing and world modelling. Our findings paint a nuanced picture. Notably, we show DeepSeek-R1 has a ‘sweet spot’ of reasoning, where extra inference time can impair model performance. Furthermore, we find a tendency for DeepSeek-R1 to persistently ruminate on previously explored problem formulations, obstructing further exploration. We also note strong safety vulnerabilities of DeepSeek-R1 compared to its non-reasoning counterpart, which can also compromise safety-aligned LLMs.

URL: https://openreview.net/forum?id=BZwKsiRnJI

---

Title: The Vertex-Attribute-Constrained Densest $k$-Subgraph Problem

Abstract: Dense subgraph mining is a fundamental technique in graph mining, commonly applied in fraud detection, community detection, product recommendation, and document summarization. In such applications, we are often interested in identifying communities, recommendations, or summaries that reflect different constituencies, styles or genres, and points of view. For this task, we introduce a new variant of the Densest $k$-Subgraph (D$k$S) problem that incorporates the attribute values of vertices. The proposed {\em Vertex-Attribute-Constrained Densest $k$-Subgraph} (VAC-D$k$S) problem retains the NP-hardness and inapproximability properties of the classical D$k$S. Nevertheless, we prove that a suitable continuous relaxation of VAC-D$k$S is tight and can be efficiently tackled using a projection-free Frank--Wolfe algorithm. We also present an insightful analysis of the optimization landscape of the relaxed problem. Extensive experimental results demonstrate the effectiveness of our proposed formulation and algorithm, and its ability to scale up to large graphs. We further elucidate the properties of VAC-D$k$S versus classical D$k$S in a political network mining application, where VAC-D$k$S identifies a balanced and more meaningful set of politicians representing different ideological camps, in contrast to the classical D$k$S solution which is unbalanced and rather mundane.

URL: https://openreview.net/forum?id=ae8hda3atq

---

Title: B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

Abstract: Post-hoc explanation methods for black-box models often struggle with faithfulness and human interpretability due to the lack of explainability in current neural architectures. Meanwhile, B-cos networks have been introduced to improve model explainability by proposing an architecture that removes bias terms and promotes input-weight alignment. Although B-cos networks have shown success in building explainable systems, their application has so far been limited to computer vision models and their associated training pipelines. In this work, we introduce B-cos LMs, i.e., B-cos language models (LMs) empowered for natural language processing (NLP) tasks. Our approach directly transforms pre-trained language models into B-cos LMs by combining B-cos conversion and task fine-tuning, improving efficiency compared to previous methods. Our automatic and human evaluation results demonstrate that B-cos LMs produce more faithful and human interpretable explanations than post-hoc methods, while maintaining task performance comparable to conventional fine-tuning. Our in-depth analysis explores how B-cos LMs differ from conventionally fine-tuned models in their learning processes and explanation patterns. Finally, we are also the first to explore the transformation of decoder-only models to B-cos LMs for generation tasks.

URL: https://openreview.net/forum?id=c180UH8Dg8

---

Title: ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding and Segmentation

Abstract: Transformers have revolutionized Computer Vision (CV) through self-attention mechanisms. However, their complexity makes latent token representations difficult to interpret. We introduce ultra, a framework for interpreting Transformer embeddings and uncovering meaningful semantic patterns within them. ultra enables unsupervised semantic segmentation using pre-trained models without requiring fine-tuning. Additionally, we propose a self-supervised training approach that refines segmentation performance by learning an external transformation matrix without modifying the underlying model. Our method achieves state-of-the-art performance in unsupervised semantic segmentation, outperforming existing segmentation methods. Furthermore, we validate \ours for model interpretation on both synthetic and real-world scenarios, including Object Selection and interpretable text summarization using LLMs, demonstrating its broad applicability in explaining the semantic structure of latent token representations.

URL: https://openreview.net/forum?id=vL3pmJjGDQ

---

Title: Inverting Gradient Attacks Makes Powerful Data Poisoning

Abstract: Gradient attacks and data poisoning tamper with the training of machine learning algorithms to maliciously alter them and have been proven to be equivalent in convex settings. The extent of harm these attacks can produce in non-convex settings is still to be determined.
Gradient attacks are practical for fewer systems than data poisoning but have been argued to be more harmful since they can be arbitrary, whereas data poisoning reduces the attacker’s power to only being able to inject data points to training sets, via e.g. legitimate participation in a collaborative dataset. This raises the question whether the harm made by gradient attacks can be matched by data poisoning in non-convex settings. In this work, we provide a positive answer and show how data poisoning can mimic gradient attacks to perform an availability attack on (non-convex) neural networks. Through gradient inversion, commonly used to reconstruct data points from actual gradients, we show how reconstructing data points out of malicious gradients can be sufficient to perform a range of attacks. This allows us to show, for the first time, a worst-case availability attack on neural networks through data poisoning, degrading the model’s performances to random-level through a minority (as low as 1%) of poisoned points.

URL: https://openreview.net/forum?id=Lvy5MjyTh3

---

Title: Uncertainty-Aware Transformers: Conformal Prediction for Language Models

Abstract: Transformers have had a profound impact on the field of artificial intelligence, especially on large language models and their variants. Unfortunately, as was the case historically with neural networks, the black-box nature of transformer architectures presents significant challenges to interpretability and trustworthiness. These challenges generally emerge in high-stakes domains, such as healthcare, robotics, and finance, where incorrect predictions can have significant negative consequences, such as misdiagnosis or failed investments. For models to be genuinely useful and trustworthy in critical applications, they must provide more than just predictions: they must supply users with a clear understanding of the reasoning that underpins their decisions. This paper presents an uncertainty quantification framework for transformer-based language models. This framework, called CONFIDE (CONformal prediction for FIne-tuned DEep language models), applies conformal prediction to the internal embeddings of encoder-only architectures, like BERT and RoBERTa,
based on hyperparameters, such as distance metrics and principal component analysis. CONFIDE uses either [CLS] token embeddings or flattened hidden states to construct class-conditional nonconformity scores, enabling statistically valid prediction sets with instance-level explanations. Empirically, CONFIDE improves test accuracy by up to $4.09\%$ on BERT-tiny and achieves greater correct efficiency (i.e., the expected size of the prediction set conditioned on it containing the true label) compared to prior methods, including NM2 and VanillaNN. We show that early and intermediate transformer layers often yield better-calibrated and more semantically meaningful representations for conformal prediction. In resource-constrained models and high-stakes tasks with ambiguous labels, CONFIDE offers robustness and interpretability where softmax-based uncertainty fails.

URL: https://openreview.net/forum?id=f8CTRCgE9a

---

Reply all

Reply to author

Forward

0 new messages