Weekly TMLR digest for Nov 09, 2025

2 views

Skip to first unread message

TMLR

unread,

Nov 9, 2025, 12:00:22 AMNov 9

to tmlr-annou...@googlegroups.com

New certifications
==================

Expert Certification: Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization

Abdullah Akgül, Gulcin Baykal, Manuel Haussmann, Melih Kandemir

https://openreview.net/forum?id=KTfTwxsVNE

---

J2C Certification: Testing with Non-identically Distributed Samples

Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant

https://openreview.net/forum?id=FUzvztzBlW

---

J2C Certification: TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

Yao Xiao, Qiqian Fu, Heyi Tao, Yuqun Wu, Zhen Zhu, Derek Hoiem

https://openreview.net/forum?id=KZLmkL62M4

---

Featured Certification, J2C Certification: PCF Learned Sort: a Learning Augmented Sort Algorithm with $\mathcal{O}(n \log\log n)$ Expected Complexity

Atsuki Sato, Yusuke Matsui

https://openreview.net/forum?id=wVkb8WHbvR

---

J2C Certification: DNOD: Deformable Neural Operators for Object Detection in SAR Images

GVS Mothish, J Rishi, Shobhit Kumar Shukla, Deepak Subramani

https://openreview.net/forum?id=tjBqPJdQ72

---

Survey Certification: Neural Spatiotemporal Point Processes: Trends and Challenges

Sumantrak Mukherjee, Mouad Elhamdi, George Mohler, David Antony Selby, Yao Xie, Sebastian Josef Vollmer, Gerrit Großmann

https://openreview.net/forum?id=N69lSYWkMw

---

J2C Certification: Encoder-only Next Token Prediction

Ethan Ewer, Daewon Chae, Thomas Zeng, Jinkyu Kim, Kangwook Lee

https://openreview.net/forum?id=CGHi289y8e

---

Accepted papers
===============

Title: Privacy-Aware Time Series Synthesis via Public Knowledge Distillation

Authors: Penghang Liu, Haibei Zhu, Eleonora Kreacic, Svitlana Vyetrenko

Abstract: Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.

URL: https://openreview.net/forum?id=TC6ihoRw0c

---

Title: Batched Nonparametric Bandits via k-Nearest Neighbor UCB

Authors: Sakshi Arya

Abstract: We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing, where online feedback is limited, we propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle. Our method, BaNk-UCB, is fully nonparametric, adapts to the context density, and is simple to implement. Unlike prior works relying on parametric or binning-based estimators, BaNk-UCB uses local geometry of the contexts to estimate rewards and adaptively balances exploration and exploitation. We provide near-optimal regret guarantees under standard Lipschitz smoothness and margin assumptions, using a theoretically motivated batch schedule that balances regret across batches and achieves minimax-optimal rates. Empirical evaluations on synthetic and real-world datasets demonstrate that BaNk-UCB consistently outperforms binning-based baselines.

URL: https://openreview.net/forum?id=9gB2Eu0PXb

---

Title: Stacking Variational Bayesian Monte Carlo

Authors: Francesco Silvestrin, Chengkun LI, Luigi Acerbi

Abstract: Approximate Bayesian inference for models with computationally expensive, black-box likelihoods poses a significant challenge, especially when the posterior distribution is complex. Many inference methods struggle to explore the parameter space efficiently under a limited budget of likelihood evaluations. Variational Bayesian Monte Carlo (VBMC) is a sample-efficient method that addresses this by building a local surrogate model of the log-posterior. However, its conservative exploration strategy, while promoting stability, can cause it to miss important regions of the posterior, such as distinct modes or long tails.
In this work, we introduce Stacking Variational Bayesian Monte Carlo (S-VBMC), a method that overcomes this limitation by constructing a robust, global posterior approximation from multiple independent VBMC runs. Our approach merges these local approximations through a principled and inexpensive post-processing step that leverages VBMC's mixture posterior representation and per-component evidence estimates. Crucially, S-VBMC requires no additional likelihood evaluations and is naturally parallelisable, fitting seamlessly into existing inference workflows. We demonstrate its effectiveness on two synthetic problems designed to challenge VBMC's exploration and two real-world applications from computational neuroscience, showing substantial improvements in posterior approximation quality across all cases. Our code is available as a Python package at https://github.com/acerbilab/svbmc.

URL: https://openreview.net/forum?id=M2ilYAJdPe

---

Title: Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion

Authors: Shuqi Ke, Charlie Hou, Sewoong Oh, Giulia Fanti

Abstract: We show that **d**ifferentially **p**rivate **f**ull **f**ine-**t**uning (DP-FFT) can distort pre-trained backbone features based on both theoretical and empirical results. We identify the cause of the distortion as the misalignment between the pre-trained backbone and the randomly initialized linear head. We prove that a sequential fine-tuning strategy can mitigate the feature distortion: first-linear-probing-then-fine-tuning (DP-LP-FFT). A new approximation scheme allows us to derive approximate upper and lower bounds on the training loss of DP-LP and DP-FFT, in a simple but canonical setting of 2-layer neural networks with ReLU activation. Experiments on real-world datasets and architectures are consistent with our theoretical insights. We also derive new upper bounds for 2-layer linear networks without the approximation. Moreover, our theory suggests a trade-off of privacy budget allocation in multi-phase fine-tuning methods like DP-LP-FFT.

URL: https://openreview.net/forum?id=LwT8aDv502

---

Title: Universal Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning

Authors: Yinglun Xu, Gagandeep Singh

Abstract: This work proposes the first universal black-box targeted attack against online reinforcement learning through reward poisoning during training time. Our attack is universally efficient against any efficient learning algorithm training in general RL environments and requires limited attack budgets and computational resources. We generalize a common feature of the efficient learning algorithms and assume that such algorithms would mostly take the optimal actions or actions close to them during training. We quantify the efficiency of an attack and propose an attack framework where it is feasible to evaluate the efficiency of any attack instance in the framework based on the assumption. Finally, we find an instance in the framework that requires a minimal per-step perturbation, which we call `adaptive target attack.' We theoretically analyze and prove a lower bound for the attack efficiency of our attack in the general RL setting. Empirically, on a diverse set of popular DRL environments learned by state-of-the-art DRL algorithms, we verify that our attack efficiently leads the learning agent to various target policies with limited budgets.

URL: https://openreview.net/forum?id=MX0aDKu8lY

---

Title: GenOL: Generating Diverse Examples for Name-only Online Learning

Authors: Minhyuk Seo, Seongwon Cho, Minjae Lee, Diganta Misra, Hyeonbeom Choi, Seon Joo Kim, Jonghyun Choi

Abstract: Online learning methods often rely on supervised data. However, under data distribution shifts, such as in continual learning (CL), where continuously arriving online data streams incorporate new concepts (e.g., classes), real-time manual annotation is impractical due to its costs and latency, which hinder real-time adaptation. To alleviate this, `name-only' setup has been proposed, requiring only the name of concepts, not the supervised samples. A recent approach tackles this setup by supplementing data with web-scraped images, but such data often suffers from issues of data imbalance, noise, and copyright. To overcome the limitations of both human supervision and webly supervision, we propose GenOL using generative models for name-only training. But naive application of generative models results in limited diversity of generated data. Here, we enhance (i) intra-diversity, the diversity of images generated by a single model, by proposing a diverse prompt generation method that generates diverse text prompts for text-to-image models, and (ii) inter-diversity, the diversity of images generated by multiple generative models, by introducing an ensemble strategy that selects minimally overlapping samples. We empirically validate that the proposed \frameworkname outperforms prior arts, even a model trained with fully supervised data by large margins, in various tasks, including image recognition and multi-modal visual reasoning.

URL: https://openreview.net/forum?id=QPfVoTMLWq

---

Title: Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics

Authors: PANKAJ KUMAR, Subhankar Mishra

Abstract: Large Language Models (LLMs) have emerged as a promising cornerstone for the development of natural language processing (NLP) and artificial intelligence (AI). However, ensuring the robustness of LLMs remains a critical challenge. To address these challenges and advance the field, this survey provides a comprehensive overview of current studies in this area. First, we systematically examine the nature of robustness in LLMs, including its conceptual foundations, the importance of consistent performance across diverse inputs, and the implications of failure modes in real-world applications. Next, we analyze the sources of non-robustness, categorizing intrinsic model limitations, data-driven vulnerabilities, and external adversarial factors that compromise reliability. Following this, we review state-of-the-art mitigation strategies, and then we discuss widely adopted benchmarks, emerging metrics, and persistent gaps in assessing real-world reliability. Finally, we synthesize findings from existing surveys and interdisciplinary studies to highlight trends, unresolved issues, and pathways for future research.

URL: https://openreview.net/forum?id=Bchvaaod6g

---

Title: The inexact power augmented Lagrangian method for constrained nonconvex optimization

Authors: Alexander Bodard, Konstantinos Oikonomidis, Emanuel Laude, Panagiotis Patrinos

Abstract: This work introduces an unconventional inexact augmented Lagrangian method where the augmenting term is a Euclidean norm raised to a power between one and two. The proposed algorithm is applicable to a broad class of constrained nonconvex minimization problems that involve nonlinear equality constraints. In a first part of this work, we conduct a full complexity analysis of the method under a mild regularity condition, leveraging an accelerated first-order algorithm for solving the Hölder-smooth subproblems. Interestingly, this worst-case result indicates that using lower powers for the augmenting term leads to faster constraint satisfaction, albeit with a slower decrease of the dual residual. Notably, our analysis does not assume boundedness of the iterates. Thereafter, we present an inexact proximal point method for solving the weakly-convex and Hölder-smooth subproblems, and demonstrate that the combined scheme attains an improved rate that reduces to the best-known convergence rate whenever the augmenting term is a classical squared Euclidean norm. Different augmenting terms, involving a lower power, further improve the primal complexity at the cost of the dual complexity. Finally, numerical experiments validate the practical performance of unconventional augmenting terms.

URL: https://openreview.net/forum?id=63ANb4r7EM

---

Title: Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Authors: Yu Gu, Kai Zhang, Yuting Ning, Boyuan Zheng, Boyu Gou, Tianci Xue, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su

Abstract: Language agents based on large language models (LLMs) have demonstrated great promise in automating web-based tasks. Recent work has shown that incorporating advanced planning algorithms, e.g., tree search, is advantageous over reactive planning for web agents. However, unlike simulated sandbox environments, real-world environments such as the web are rife with irreversible actions. This undermines the feasibility of backtracking, a cornerstone of (tree) search. Overly relying on test-time search also hurts efficiency. We advocate model-based planning for web agents that employs a world model to simulate and deliberate over the outcome of each candidate action before committing to one. We systematically explore this paradigm by: (1) Proposing a model-based planning framework, WebDreamer, which employs LLMs to serve as both world models and value functions; (2) Training specialized LLMs as world models with a scalable data synthesis pipeline. Empirical results demonstrate that WebDreamers achieves substantial performance improvements over reactive baselines. It is competitive, while being - times more efficient, with tree search in sandbox environments (VisualWebArena) and also works effectively on real-world websites (Online-Mind2Web and Mind2Web-Live). Furthermore, our trained world model, Dreamer-7B, performs comparable to GPT-4o, highlighting the potential of specialized world models for efficient and effective planning in complex web environments. All code, models, and data are publicly available at https://github.com/OSU-NLP-Group/WebDreamer

URL: https://openreview.net/forum?id=c6l7yA0HSq

---

Title: DRAGON: Distributional Rewards Optimize Diffusion Generative Models

Authors: Yatong Bai, Jonah Casebeer, Somayeh Sojoudi, Nicholas J. Bryan

Abstract: We present Distributional RewArds for Generative OptimizatioN (DRAGON), a versatile framework for fine-tuning media generation models towards a desired outcome. Compared with traditional reinforcement learning with human feedback (RLHF) or pairwise preference approaches such as direct preference optimization (DPO), DRAGON is more flexible. It can optimize reward functions that evaluate either individual examples or distributions of them, making it compatible with a broad spectrum of instance-wise, instance-to-distribution, and distribution-to-distribution rewards. Leveraging this versatility, we construct novel reward functions by selecting an encoder and a set of reference examples to create an exemplar distribution. When cross-modal encoders such as CLAP are used, the reference may be of a different modality (e.g., text versus audio). Then, DRAGON gathers online and on-policy generations, scores them with the reward function to construct a positive demonstration set and a negative set, and leverages the contrast between the two finite sets to approximate distributional reward optimization. For evaluation, we fine-tune an audio-domain text-to-music diffusion model with 20 reward functions, including a custom music aesthetics model, CLAP score, Vendi diversity, and Fréchet audio distance (FAD). We further compare instance-wise (per-song) and full-dataset FAD settings while ablating multiple FAD encoders and reference sets. Over all 20 target rewards, DRAGON achieves an 81.45% average win rate. Moreover, reward functions based on exemplar sets indeed enhance generations and are comparable to model-based rewards. With an appropriate exemplar set, DRAGON achieves a 60.95% human-voted music quality win rate without training on human preference annotations. As such, DRAGON exhibits a new approach to designing and optimizing reward functions for improving human-perceived quality. Example generations can be found at https://ml-dragon.github.io/web/.

URL: https://openreview.net/forum?id=gobhDku03J

---

Title: Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation

Authors: Martin Genzel, Patrick Putzky, Pengfei Zhao, Sebastian Schulze, Mattes Mollenhauer, Robert Seidel, Stefan Dietzel, Thomas Wollmann

Abstract: The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To achieve parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. Importantly, the pruning order of the parameters is used to derive a global score map that allows compressing a model to any target size without re-computation. We evaluate ACIP on a large selection of open-weight LLMs and downstream tasks, demonstrating state-of-the-art results compared to existing factorization-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques.

URL: https://openreview.net/forum?id=Y6hdYf8tsg

---

Title: Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization

Authors: Abdullah Akgül, Gulcin Baykal, Manuel Haussmann, Melih Kandemir

Abstract: Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i)~preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and accurate approximation of the uncertainty around the state value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by changes in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emph{Evidential Proximal Policy Optimization (EPPO)} due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.

URL: https://openreview.net/forum?id=KTfTwxsVNE

---

Title: RANa: Retrieval-Augmented Navigation

Authors: Gianluca Monaci, Rafael S. Rezende, Romain Deffayet, Gabriela Csurka, Guillaume Bono, Hervé Déjean, Stéphane Clinchant, Christian Wolf

Abstract: Methods for navigation based on large-scale learning typically treat each episode as a new problem, where the agent is spawned with a clean memory in an unknown environment. While these generalization capabilities to an unknown environment are extremely important, we claim that, in a realistic setting, an agent should have the capacity of exploiting information collected during earlier robot operations. We address this by introducing a new retrieval-augmented agent, trained with RL, capable of querying a database collected from previous episodes in the same environment and learning how to integrate this additional context information. We introduce a unique agent architecture for the general navigation task, evaluated on ImageNav, Instance-ImageNav and ObjectNav. Our retrieval and context encoding methods are data-driven and employ vision foundation models (FM) for both semantic and geometric understanding. We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance.

URL: https://openreview.net/forum?id=OWCJ5JfsRB

---

Title: SCNode: Spatial and Contextual Coordinates for Graph Representation Learning

Authors: Md Joshem Uddin, Astrit Tola, Varin Singh Sikand, Cuneyt Gurcan Akcora, Baris Coskunuzer

Abstract: Effective node representation lies at the heart of Graph Neural Networks (GNNs), as it directly impacts their ability to perform downstream tasks such as node classification and link prediction. Most existing GNNs, particularly message passing graph neural networks, rely on neighborhood aggregation to iteratively compute node embeddings. While powerful, this paradigm suffers from well-known limitations of oversquashing, oversmoothing, and underreaching that degrade representation quality. More critically, MPGNNs often assume homophily, where connected nodes share similar features or labels, leading to poor generalization in heterophilic graphs where this assumption breaks down.

To address these challenges, we propose *SCNode*, a *Spatial-Contextual Node Embedding* framework designed to perform consistently well in both homophilic and heterophilic settings. SCNode integrates spatial and contextual information, yielding node embeddings that are not only more discriminative but also structurally aware. Our approach introduces new homophily matrices for understanding class interactions and tendencies. Extensive experiments on benchmark datasets show that SCNode achieves superior performance over conventional GNN models, demonstrating its robustness and adaptability in diverse graph structures.

URL: https://openreview.net/forum?id=wdcdKeFbfQ

---

Title: CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection

Authors: Byeongchan Lee, John Won, Seunghyun Lee, Jinwoo Shin

Abstract: Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types (e.g., local and global defect), and the scarcity of training data. As such, it necessitates a comprehensive model capable of capturing both low-level and high-level features, even with limited data. To address this, we propose CLIPFUSION, a method that leverages both discriminative and generative foundation models. Given the CLIP-based discriminative model's limited capacity to capture fine-grained local details, we incorporate a diffusion-based generative model to complement its features. This integration yields a synergistic solution for anomaly detection. To this end, we propose using diffusion models as feature extractors for anomaly detection, and introduce carefully designed strategies to extract informative cross-attention and feature maps. Experimental results on benchmark datasets (MVTec-AD, VisA) demonstrate that CLIPFUSION consistently outperforms baseline methods in both anomaly segmentation and classification under both zero-shot and few-shot settings. We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection, providing a scalable solution for real-world applications.

URL: https://openreview.net/forum?id=WpFzZNuQmg

---

Title: The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

Authors: Tianshi Zheng, Yixiang Chen, Chengxi Li, Chunyang Li, Qing Zong, Haochen Shi, Baixuan Xu, Yangqiu Song, Ginny Wong, Simon See

Abstract: Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs). However, our study reveals a surprising contradiction to this prevailing perspective within the fundamental domain of pattern-based in-context learning (ICL). Through extensive experiments involving 16 state-of-the-art LLMs and nine diverse pattern-based ICL datasets, we demonstrate that CoT and its reasoning variants consistently underperform direct answering across varying model scales and benchmark complexities. To systematically investigate this unexpected phenomenon, we designed extensive experiments to validate several hypothetical explanations. Our analysis uncovers a fundamental hybrid mechanism of explicit-implicit reasoning driving CoT’s performance in pattern-based ICL: while explicit reasoning falters due to LLMs’ struggles to infer underlying patterns from demonstrations, implicit reasoning—disrupted by the increased contextual distance of CoT rationales—often compensates, delivering correct answers despite flawed rationales. This hybrid mechanism explains CoT’s relative underperformance, as noise from weak explicit inference undermines the process, even as implicit mechanisms partially salvage outcomes. Notably, even long-CoT reasoning models, which excel in abstract and symbolic reasoning, fail to fully overcome these limitations despite higher computational costs. Our findings challenge existing assumptions regarding the universal efficacy of CoT, yielding novel insights into its limitations and guiding future research toward more nuanced and effective reasoning methodologies for LLMs.

URL: https://openreview.net/forum?id=7SIrvcYNYj

---

Title: Scalable Generative Modeling of Weighted Graphs

Authors: Richard Williams, Eric Nalisnick, Andrew Holbrook

Abstract: Weighted graphs are ubiquitous throughout biology, chemistry, and the social sciences, motivating the development of generative models for abstract weighted graph data using deep neural networks. However, most current deep generative models are designed for unweighted graphs and cannot be easily extended to weighted topologies. Among those that do incorporate edge weights, few consider a joint distribution with the topology of the graph. Furthermore, learning a distribution over weighted graphs must account for complex nonlocal dependencies between both the edges of the graph and corresponding weights of each edge. We develop an autoregressive model BiGG-E, a nontrivial extension of the BiGG model, that learns a joint distribution over weighted graphs while exploiting sparsity to generate a weighted graph with $n$ nodes and $m$ edges in $O((n + m)\log n)$ time. Simulation studies and experiments on a variety of benchmark datasets demonstrate that BiGG-E best captures distributions over weighted graphs while remaining scalable and computationally efficient.

URL: https://openreview.net/forum?id=yWKkBOcD18

---

Title: Testing with Non-identically Distributed Samples

Authors: Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant

Abstract: We examine the extent to which sublinear-sample property testing and estimation applies to settings where samples are independently but not identically distributed. Specifically, we consider the following distributional property testing framework: Suppose there is a set of distributions over a discrete support of size $k$, $p_1, p_2,\ldots,p_T$, and we obtain $c$ independent draws from each distribution. Suppose the goal is to learn or test a property of the average distribution, $p_{\mathrm{avg}}$. This setup models a number of important practical settings where the individual distributions correspond to heterogeneous entities --- either individuals, chronologically distinct time periods, spatially separated data sources, etc. From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $p_{\mathrm{avg}}$ to within error $\varepsilon$ in $\ell_1$ distance. To test uniformity or identity --- distinguishing the case that $p_{\mathrm{avg}}$ is equal to some reference distribution, versus has $\ell_1$ distance at least $\varepsilon$ from the reference distribution, we show that a linear number of samples in $k$ is necessary given $c=1$ samples from each distribution. In contrast, for $c \ge 2$, we recover the usual sublinear sample testing guarantees of the i.i.d. setting: we show that $O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ total samples are sufficient, matching the optimal sample complexity in the i.i.d. case in the regime where $\varepsilon \ge k^{-1/4}$. Additionally, we show that in the $c=2$ case, there is a constant $\rho > 0$ such that even in the linear regime with $\rho k$ samples, no tester that considers the multiset of samples (ignoring which samples were drawn from the same $p_i$) can perform uniformity testing. We further extend our techniques to the problem of testing ''closeness'' of two distributions: given $c=3$ independent draws from each of $p_1, p_2,\ldots,p_T$ and $q_1, q_2,\ldots,q_T$, one can distinguish the case that $p_{\mathrm{avg}}=q_{\mathrm{avg}}$ versus having $\ell_1$ distance at least $\varepsilon$ using $O(k^{2/3}/\varepsilon^{8/3})$ total samples, where $k$ is an upper bound on the support size, matching the optimal sample complexity of the i.i.d. setting up to the $\varepsilon$-dependence.

URL: https://openreview.net/forum?id=FUzvztzBlW

---

Title: TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

Authors: Yao Xiao, Qiqian Fu, Heyi Tao, Yuqun Wu, Zhen Zhu, Derek Hoiem

Abstract: Image-text models excel at image-level tasks but struggle with detailed visual understanding. While these models provide strong visual-language alignment, segmentation models like SAM2 offer precise spatial boundaries for objects. To this end, we propose TextRegion, a simple, effective, and training-free framework that combines the strengths of image-text models and SAM2 to generate powerful text-aligned region tokens. These tokens enable detailed visual understanding while preserving open-vocabulary capabilities. They can be directly applied to various downstream tasks, including open-world semantic segmentation, referring expression comprehension, and grounding. We conduct extensive evaluations and consistently achieve superior or competitive performance compared to state-of-the-art training-free methods. Additionally, our framework is compatible with many image-text models, making it highly practical and easily extensible as stronger models emerge. Code is available at: https://github.com/avaxiao/TextRegion.

URL: https://openreview.net/forum?id=KZLmkL62M4

---

Title: A Practical Investigation of Spatially-Controlled Image Generation with Transformers

Authors: Guoxuan Xia, Harleen Hanspal, Petru-Daniel Tudosiu, Shifeng Zhang, Sarah Parisot

Abstract: Enabling image generation models to be spatially controlled is an important area of research, empowering users to better generate images according to their own fine-grained specifications via e.g. edge maps, poses. Although this task has seen impressive improvements in recent times, a focus on rapidly producing stronger models has come at the cost of detailed and fair scientific comparison. Differing training data, model architectures and generation paradigms make it difficult to disentangle the factors contributing to performance. Meanwhile, the motivations and nuances of certain approaches become lost in the literature. In this work, we aim to provide clear takeaways across generation paradigms for practitioners wishing to develop transformer-based systems for spatially-controlled generation, clarifying the literature and addressing knowledge gaps. We perform controlled experiments on ImageNet across diffusion-based/flow-based and autoregressive (AR) models. First, we establish control token prefilling as a simple, general and performant baseline approach for transformers. We then investigate previously underexplored sampling time enhancements, showing that extending classifier-free guidance to control, as well as softmax truncation, have a strong impact on control-generation consistency.
Finally, we re-clarify the motivation of adapter-based approaches, demonstrating that they mitigate “forgetting” and maintain generation quality when trained on limited downstream data, but underperform full training in terms of generation-control consistency. Code: https://github.com/guoxoug/transformer-imagenet-ctrl.

URL: https://openreview.net/forum?id=loT6xhgLYK

---

Title: PCF Learned Sort: a Learning Augmented Sort Algorithm with $\mathcal{O}(n \log\log n)$ Expected Complexity

Authors: Atsuki Sato, Yusuke Matsui

Abstract: Sorting is one of the most fundamental algorithms in computer science. Recently, Learned Sorts, which use machine learning to improve sorting speed, have attracted attention. While existing studies show that Learned Sort is empirically faster than classical sorting algorithms, they do not provide theoretical guarantees about its computational complexity. We propose Piecewise Constant Function (PCF) Learned Sort, a theoretically guaranteed Learned Sort algorithm. We prove that the expected complexity of PCF Learned Sort is $\mathcal{O}(n \log \log n)$ under mild assumptions on the data distribution. We also confirm empirically that PCF Learned Sort has a computational complexity of $\mathcal{O}(n \log \log n)$ on both synthetic and real datasets. This is the first study to theoretically support the empirical success of Learned Sort, and provides evidence for why Learned Sort is fast.

URL: https://openreview.net/forum?id=wVkb8WHbvR

---

Title: DNOD: Deformable Neural Operators for Object Detection in SAR Images

Authors: GVS Mothish, J Rishi, Shobhit Kumar Shukla, Deepak Subramani

Abstract: We introduce a deep neural operator framework aimed at object detection in remotely sensed Synthetic Aperture Radar (SAR) images. Recent research highlights the impressive performance of the End-to-End Object Detection Transformer (DETR). Nonetheless, in domains like SAR imaging, managing challenges such as speckle noise and the detection of small objects continues to be problematic. To address SAR object detection issues, we present the Deformable Neural Operator-Based Object Detection (DNOD) framework, tailored for SAR tasks. We develop two neural operators: Multi-Scale Fourier Mixing (MSFM) for the encoder and Multi-scale, multi-input Adaptive Deformable Fourier Neural Operator (MADFNO) for the decoder. Detailed evaluations and ablation studies show that DNOD exceeds existing methods, delivering significantly better results with an improvement of +2.23 mAP on the SARDet-100k dataset, the largest SAR object detection compilation. The code is available at https://github.com/quest-lab-iisc/DNOD.

URL: https://openreview.net/forum?id=tjBqPJdQ72

---

Title: Neural Spatiotemporal Point Processes: Trends and Challenges

Authors: Sumantrak Mukherjee, Mouad Elhamdi, George Mohler, David Antony Selby, Yao Xie, Sebastian Josef Vollmer, Gerrit Großmann

Abstract: Spatiotemporal point processes (STPPs) are probabilistic models for events occurring in continuous space and time. Real-world event data often exhibits intricate dependencies and heterogeneous dynamics. By incorporating modern deep learning techniques, STPPs can model these complexities more effectively than traditional approaches. Consequently, the fusion of neural methods with STPPs has become an active and rapidly evolving research area. In this review, we categorize existing approaches, unify key design choices, and explain the challenges of working with this data modality. We further highlight emerging trends and diverse application domains. Finally, we identify open challenges and gaps in the literature.

URL: https://openreview.net/forum?id=N69lSYWkMw

---

Title: Encoder-only Next Token Prediction

Authors: Ethan Ewer, Daewon Chae, Thomas Zeng, Jinkyu Kim, Kangwook Lee

Abstract: Next-token prediction is conventionally done using decoder-only Transformers with causal attention, as this approach allows for efficient reuse of keys and values. What if we were not compute-limited, should we still use decoder-only Transformers? In this work, we introduce Encoder-only Next Token Prediction (ENTP). We explore the differences between ENTP and decoder-only Transformers in expressive power and complexity, highlighting potential advantages of ENTP in settings with unbounded compute. We introduce the $\operatorname{Count3}$ task and show, both theoretically and experimentally, that while ENTP can perform this task easily, a decoder-only Transformer cannot. Finally, we empirically demonstrate the superior performance of ENTP across representative tasks where next-token prediction based Transformers can be evaluated, including addition, in-context learning, and language modeling.

URL: https://openreview.net/forum?id=CGHi289y8e

---

Title: Constrained Reinforcement Learning with Smoothed Log Barrier Function

Authors: Baohe Zhang, Yuan Zhang, Hao Zhu, Shengchao Yan, Thomas Brox, Joschka Boedecker

Abstract: Deploying reinforcement learning (RL) in real-world systems often requires satisfying strict safety constraints during both training and deployment, which simple reward shaping typically fails to enforce. Existing constrained RL algorithms frequently face several major challenges, including instabilities during training and overly conservative policies.
To overcome these limitations, we propose CSAC-LB (Constrained Soft Actor-Critic with Log Barrier), a model-free, sample-efficient, off-policy algorithm that requires no pre-training. CSAC-LB integrates a linear smoothed log barrier function into the actor’s objective, providing a numerically stable, non-vanishing gradient that enables the agent to quickly recover from unsafe states while avoiding the instability of traditional interior-point methods. To further enhance safety and mitigate the underestimation of constraint violations, we employ a pessimistic double-critic architecture for the cost function, taking the maximum of two cost Q-networks to conservatively guide the policy.
Through extensive experiments on challenging constrained control tasks, we demonstrate that CSAC-LB significantly outperforms baselines by consistently achieving high returns while strictly adhering to safety constraints. Our results establish CSAC-LB as a robust and stable solution for applying RL to safety-critical domains.

URL: https://openreview.net/forum?id=Amh95oURaE

---

Title: A Note On The Stability Of The Focal Loss

Authors: Martijn P. van Leeuwen, Koen V. Haak, Gorkem Saygili, Eric O. Postma, L.L. Sharon Ong

Abstract: The Focal Loss is a widely deployed loss function that is used to train various types of deep learning models. It is a modification of the cross-entropy loss designed to mitigate the effect of class imbalance in dense object detection tasks. By downweighting the losses for easy, correctly classified samples, the method places more emphasis on harder, misclassified ones. As a result, gradient updates are not dominated by samples that the model already handles correctly. The downweighting of the loss is achieved by scaling the cross-entropy loss with a term that depends on a focusing parameter $\gamma$. In this paper, we highlight an unaddressed numerical instability of the Focal Loss that arises when this focusing parameter is set to a value between 0 and 1. We present the theoretical basis of this numerical instability, show that it can be detected in the computation of Focal Loss gradients, and demonstrate its effects across several classification and segmentation tasks. Additionally, we propose a straightforward modification to the original Focal Loss to ensure stability whenever these unstable focusing parameter values are used.

URL: https://openreview.net/forum?id=eCYActnGbu

---

Title: Quasipseudometric Value Functions with Dense Rewards

Authors: Khadichabonu Valieva, Bikramjit Banerjee

Abstract: As a generalization of reinforcement learning (RL) to parametrizable goals, goal conditioned RL (GCRL) has a broad range of applications, particularly in challenging tasks in robotics. Recent work has established that the optimal value function of GCRL $Q^\ast(s, a, g)$ has a quasipseudometric structure, leading to targetted neural architectures that respect such structure. However, the relevant analyses assume a sparse reward setting—a known aggravating factor to sample complexity. We show that the key property underpinning a quasipseudometric, viz., the triangle inequality, is preserved under a dense reward setting as well, specifically identifying the key condition necessary for triangle inequality. Contrary to earlier findings where dense rewards were shown to be detrimental to GCRL, we conjecture that dense reward functions that satisfy this condition can only improve, never worsen, sample complexity. We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasipseudometric value function in our dense reward setting indeed either improves upon, or preserves, the sample complexity of training with sparse rewards. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits
to sample complexity.

URL: https://openreview.net/forum?id=4LqOl6pDUe

---

Title: Thompson Sampling For Bandits With Cool-Down Periods

Authors: Jingxuan Zhu, Bin Liu

Abstract: This paper investigates a variation of dynamic bandits, characterized by arms that follow a periodic availability pattern. Upon a "successful" selection, each arm transitions to an inactive state and requires a possibly unknown cool-down period before becoming active again. We devise Thompson Sampling algorithms specifically designed for this problem, guaranteeing logarithmic regrets. Notably, this work is the first to address scenarios in which the agent lacks knowledge of each arm's active state. Furthermore, the theoretical findings extend to the sleeping bandit framework, offering a notably superior regret bound compared to existing literature.

URL: https://openreview.net/forum?id=1fv0ZS2mXm

---

Title: Before Forgetting, There's Learning: Representation Learning Challenges in Online Unsupervised Continual Learning

Authors: Cameron Ethan Taylor, Shreyas Malakarjun Patil, Constantine Dovrolis

Abstract: This paper addresses the Online Continual Unsupervised Learning (O-UCL) problem, where a learner must adapt to a stream of data arriving sequentially from a shifting distribution without storing past data or relying on labels. This challenge mirrors many real-world machine learning applications, where efficient training and updating of large or on device models is critical. We first explore the unique challenges of O-UCL and identify a secondary failure mode in addition to catastrophic forgetting. We demonstrate that the presence of transient, small-scale biases in an online data stream can significantly impair learning. Unlike traditional notions of distribution shift that manifest over long timescales, we highlight how biases occurring at the level of individual batches or short segments—while imperceptible in aggregate—can severely hinder a model’s ability to learn, a phenomenon we call ``catastrophic non-learning''. We further showcase how an auxiliary memory can be used to solve both catastrophic forgetting and catastrophic non-learning, but that the criteria for the ideal memory for each are in conflict. In response to these findings, we introduce a dual-memory framework which incorporates specifically designed modules to mitigate both catastrophic non-learning and forgetting. We validate our findings on challenging, realistic data streams derived from ImageNet and Places365, comparing against multiple baselines to highlight the distinct nature of this problem and the need for new approaches in O-UCL.

URL: https://openreview.net/forum?id=hZwInyuYDw

---

New submissions
===============

Title: Lifelong Learning of Video Diffusion Models From a Single Video Stream

Abstract: This work demonstrates that training autoregressive video diffusion models from a single video stream—resembling the experience of embodied agents—is not only possible, but can also be as effective as standard offline training given the same number of gradient steps. Our work further reveals that this main result can be achieved using experience replay methods that only retain a subset of the preceding video stream. To support training and evaluation in this setting, we introduce four new datasets for streaming lifelong generative video modeling: Lifelong Bouncing Balls, Lifelong 3D Maze, Lifelong Drive, and Lifelong PLAICraft, each consisting of one million consecutive frames from environments of increasing complexity. Together, our datasets and investigations lay the groundwork for video generative models and world models that continuously learn from single-sensor video streams rather than from fixed, curated video datasets.

URL: https://openreview.net/forum?id=xbvfqMzoOL

---

Title: Are Time-Indexed Foundation Models the Future of Time Series Imputation?

Abstract: Foundation models for time series imputation remain largely unexplored. Recently, two such models, TabPFN-TS and MoTM, have emerged. These models share a common philosophy that places them within the family of time-indexed foundation models. This paper presents the first large-scale empirical study of these models for zero-shot imputation, which enables missing value recovery without retraining across a wide range of scenarios. We conduct extensive univariate experiments across 33 out-of-domain datasets ($\approx$ 1.3M imputation windows) and evaluate their ability to integrate covariates at inference time to improve accuracy without fine-tuning. Our results demonstrate that time-indexed foundation models are a powerful and practical step toward achieving general-purpose, zero-shot imputation for real-world time series.

URL: https://openreview.net/forum?id=cTk56KpsP5

---

Title: Learning from Online Videos at Inference Time for Computer-Use Agents

Abstract: Computer-use agents can operate computers and automate laborious tasks, but despite recent rapid progress, they still lag behind human users, especially when tasks require domain-specific procedural knowledge about particular applications, platforms, and multi-step workflows. Humans can bridge this gap by watching video tutorials: we search, skim, and selectively imitate short segments that match our current subgoal. In this paper, we study how to enable computer-use agents to learn from online videos at inference time effectively. We propose a framework that retrieves and filters tutorial videos, converts them into structured demonstration trajectories, and dynamically selects trajectories as in-context guidance during execution. Particularly, using a VLM, we infer UI actions, segment videos into short subsequences of actions, and assign each subsequence a textual objective. At inference time, a two-stage selection mechanism dynamically chooses a single trajectory to add in context at each step, focusing the agent on the most helpful local guidance for its next decision. Experiments on two widely used benchmarks show that our framework consistently outperforms strong base agents and variants that use only textual tutorials or transcripts. Analyses highlight the importance of trajectory segmentation and selection, action filtering, and visual information, suggesting that abundant online videos can be systematically distilled into actionable guidance that improves computer-use agents at inference time.

URL: https://openreview.net/forum?id=YDFQIe6dqI

---

Title: GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data

Abstract: Researchers have pursued neurosymbolic artificial intelligence (AI) applications for nearly three decades because symbolic components provide abstraction while neural components provide generalization. Thus, a marriage of the two components can lead to rapid advancements in AI. Yet, the field has not realized this promise since most neurosymbolic AI frameworks fail to scale. In addition, the implicit representations and approximate reasoning of purely neural approaches limit interpretability and trust. Knowledge graphs (KGs), a gold-standard representation of explicit semantic knowledge, can address the symbolic side. However, automatically deriving reliable KGs from text corpora has remained an open problem. We address the above challenges by introducing GraphMERT, a tiny graphical encoder-only model that distills high-quality KGs from unstructured text corpora and its own internal representations. Together, GraphMERT and its equivalent KG form a modular neurosymbolic stack: neural learning of abstractions; symbolic KGs for verifiable reasoning. GraphMERT + KG is the first efficient and scalable neurosymbolic model to achieve state-of-the-art benchmark accuracy along with superior symbolic representations relative to baselines. More concretely, we target reliable domain-specific KGs that are both (1) factual (with provenance) and (2) valid (ontology-consistent relations with domain-appropriate semantics). When an off-the-shelf large language model (LLM), e.g., Qwen3-32B, generates domain-specific KGs, it falls short on the reliability front due to prompt sensitivity, shallow domain expertise, and hallucinated relations. Thus, practitioners should avoid employing LLM-generated KGs in high-stakes domains, e.g., medicine, law, business, education, etc. On text obtained from PubMed papers related to diabetes, our KG extraction pipeline with a small 80M-parameter GraphMERT yields a KG with a 69.8% FActScore; a 32B-parameter baseline LLM yields a KG that achieves only a 40.2% FActScore. The GraphMERT-extracted KG also achieves a significantly higher ValidityScore of 68.7%, compared to an LLM-generated baseline (43.0%), demonstrating its ability to preserve ontology alignment. KG cleaning further improves factuality, with GraphMERT reaching 76.9% FActScore, compared to 55.6% for the LLM baseline. GraphMERT can then treat the augmented KG as the seed KG and refine it further. Finally, human experts can edit and audit the extracted KGs, further increasing their reliability. This is nearly impossible with purely neural representations. Hence, GraphMERT enables efficient, scalable, transparent (interpretable and explainable), attributable (with provenance), accountable (with governance), editable, auditable, and continually improvable state-of-the-art neurosymbolic AI.

URL: https://openreview.net/forum?id=tnXSdDhvqc

---

Title: Preference-Based Gradient Estimation for ML-Guided Approximate Combinatorial Optimization

Abstract: Combinatorial optimization (CO) problems arise across a broad spectrum of domains, including medicine, logistics, and manufacturing. While exact solutions are often computationally infeasible, many practical applications require high-quality solutions within a given time budget. To address this, we propose a learning-based approach that enhances existing non-learned heuristics for CO. Specifically, we parameterize these heuristics and train graph neural networks (GNNs) to predict parameter values that yield near-optimal solutions. Our
method is trained end-to-end in a self-supervised fashion, using a novel gradient estimation scheme that treats the heuristic as a black box. This approach combines the strengths of learning and traditional algorithms: the GNN learns from data to guide the algorithm toward better solutions, while the heuristic ensures feasibility. We validate our method on two well-known combinatorial optimization problems: the travelling salesman problem (TSP) and the minimum k-cut problem. Our results demonstrate that the proposed approach is competitive with state-of-the-art learned CO solvers.

URL: https://openreview.net/forum?id=2S224XC378

---

Title: COLT: Enhancing Video Large Language Models with Continual Tool Usage

Abstract: The success of Large Language Models (LLMs) has significantly propelled the research of video understanding. To harvest the benefits of well-trained expert models (i.e., tool), video LLMs prioritize the exploration of tool usage capabilities. Existing methods either prompt closed-source LLMs or employ the instruction tuning paradigm for tool-use finetuning. These methods, however, assume an established repository of fixed tools and struggle to generalize to real-world environments where tool data is perpetually evolving and streaming in. To this end, we propose to enhance open-source video LLMs with COntinuaL Tool usage (termed COLT), which automatically acquires tool-use ability in a successive tool stream without suffering "catastrophic forgetting" of the past learned tools. Specifically, our COLT incorporates a learnable tool codebook as a tool-specific memory system. Then, relevant tools are dynamically selected based on the similarity between user instructions and tool features within the codebook. To unleash the tool usage potential of video LLMs, we collect a video-centric tool-use instruction tuning dataset VideoToolBench. Extensive experiments on both previous video LLM benchmarks and the tool-use-specific VideoToolBench dataset demonstrate the state-of-the-art performance of our proposed COLT.

URL: https://openreview.net/forum?id=NT9tHHTlXn

---

Title: Context-aware Learned Mesh-based Simulation via Trajectory-Level Meta-Learning

Abstract: Simulating object deformations is a critical challenge across many scientific domains, including robotics, manufacturing, and structural mechanics.
Learned Graph Network Simulators (GNSs) offer a promising alternative to traditional mesh-based physics simulators.
Their speed and inherent differentiability make them particularly well suited for applications that require fast and accurate simulations, such as robotic manipulation or manufacturing optimization.
However, existing learned simulators typically rely on single-step observations, which limits their ability to exploit temporal context.
Without this information, these models fail to infer, e.g., material properties.
Further, they rely on auto-regressive rollouts, which quickly accumulate error for long trajectories.
We instead frame mesh-based simulation as a trajectory-level meta-learning problem.
Using Conditional Neural Processes, our method enables rapid adaptation to new simulation scenarios from limited initial data while capturing their latent simulation properties.
We utilize movement primitives to directly predict fast, stable and accurate simulations from a single model call.
The resulting approach, Movement-primitive Meta-MeshGraphNet (M3GN), provides higher simulation accuracy at a fraction of the runtime cost compared to state-of-the-art GNSs across several tasks.

URL: https://openreview.net/forum?id=j5uACS2Doh

---

Title: TRecViT: A Recurrent Video Transformer

Abstract: We propose a novel block for causal video modelling. It relies on a time-space-channel factorisation with dedicated blocks for each dimension: gated linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform mixing over space, and MLPs over channels. The resulting architecture TRecViT is causal and shows strong performance on sparse and dense tasks, trained in supervised or self-supervised regimes, being the only causal video model in the state-space models family. Notably, our model outperforms or is on par with the popular (non-causal) ViViT-L model on large scale video datasets (SSv2, Kinetics400), while having $3\times$ less parameters, $12\times$ smaller memory footprint, and $5\times$ lower FLOPs count, with an inference throughput of about 300 frames per second, running comfortably in real-time. When compared with causal transformer-based models (TSM, RViT) and other recurrent models like LSTM, TRecViT obtains state-of-the-art results on the challenging SSv2 dataset.
Code and checkpoints are available online.

URL: https://openreview.net/forum?id=Mmi46Ytb1H

---

Title: A Graphical Framework for Knowledge Exchange between Humans and Neural Networks

Abstract: How could humans better teach, understand, and communicate with artificial neural networks, to correct some mistakes and to learn new knowledge? Currently, network reasoning is mostly opaque. Attempts at modifying it are usually through costly addition of new labeled data and retraining, with no guarantee that the desired improvement will be achieved. Here, we develop a framework that allows humans to understand the reasoning logic of a network easily and intuitively, in graphical form. We provide means for humans to leverage their broader contextual knowledge, common sense, and causal inference abilities: they simply inspect and modify the graph as needed, to correct any underlying flawed network reasoning. We then automatically merge and distill the modified knowledge back into the original network. The improved network can exactly replace the original, but performs better thanks to human teaching. We show viability of the approach on large-scale image classification and zero-shot learning tasks.

URL: https://openreview.net/forum?id=wyvx5ZeBFA

---

Title: Variational Geometric Information Bottleneck: Toward a Geometric Law of Understanding

Abstract: We propose a unified \emph{information–geometric} framework that formalizes understanding in learning as a trade-off between informativeness and geometric simplicity.
An encoder $\phi$ is evaluated by the utility
\[
U(\phi)=I(\phi(X);Y)-\beta\,\mathcal{C}(\phi),
\]
where $I(\phi(X);Y)$ measures task-relevant information and $\mathcal{C}(\phi)$ penalizes curvature and intrinsic dimensionality, promoting smooth, low-complexity manifolds.
Under standard manifold and regularity conditions, we establish non-asymptotic generalization bounds showing that generalization error scales with intrinsic dimension and curvature acts as a stabilizing capacity term linking geometry to sample efficiency.

To operationalize the theory, we introduce the \emph{Variational Geometric Information Bottleneck} (\texttt{V-GIB}), a variational estimator that unifies mutual-information compression with curvature regularization via tractable geometric proxies (Hutchinson trace, Jacobian-norm, and local PCA estimators).

Across synthetic manifolds, few-shot tasks, and real-world datasets (Fashion-MNIST, CIFAR-10), \texttt{V-GIB} exhibits a consistent information–geometry Pareto frontier, estimator stability, and substantial gains in interpretive efficiency.
Fractional-data experiments on CIFAR-10 further confirm the predicted \emph{efficiency–curvature law}, that curvature-aware encoders maintain accuracy under severe data scarcity.

Overall, \texttt{V-GIB} offers a principled and measurable route to representations that are geometrically coherent, data-efficient, and aligned with human-interpretable structure; providing empirical and theoretical evidence for a geometric law of understanding in learning systems.

URL: https://openreview.net/forum?id=D2s86BPSnV

---

Title: SPoT: Subpixel Placement of Tokens in Vision Transformers

Abstract: Vision Transformers naturally accommodate sparsity, yet standard tokenization methods confine features to discrete patch grids. This constraint prevents models from fully exploiting sparse regimes, forcing awkward compromises. We propose Subpixel Placement of Tokens (SPoT), a novel tokenization strategy that positions tokens continuously within images, effectively sidestepping grid-based limitations. With our proposed oracle-guided search, we uncover substantial performance gains achievable with ideal subpixel token positioning, drastically reducing the number of tokens necessary for accurate predictions during inference. SPoT provides a new direction for flexible, efficient, and interpretable ViT architectures, redefining sparsity as a strategic advantage rather than an imposed limitation.

URL: https://openreview.net/forum?id=XrBzSmzAVo

---

Title: Theoretically Understanding Data Reconstruction Leakage in Federated Learning

Abstract: Federated learning (FL) is an emerging collaborative learning paradigm that aims to protect data privacy. Unfortunately, recent works show FL algorithms are vulnerable to the serious data reconstruction attacks.However, existing works lack a theoretical foundation on to what extent the devices' data can be reconstructed and the effectiveness of these attacks cannot be compared fairly due to their unstable performance. To address this deficiency, we propose a theoretical framework to understand data reconstruction attacks to FL. Our framework involves bounding the data reconstruction error and an attack's error bound reflects its inherent attack effectiveness.Under the framework, we can theoretically compare the effectiveness of existing attacks. For instance, our results on multiple datasets validate that the iDLG attack inherently outperforms the DLG attack.

URL: https://openreview.net/forum?id=1UfDXeYxwk

---

Title: Bayesian Network Structure Discovery Using Large Language Models

Abstract: Understanding probabilistic relationships among variables is crucial for analyzing complex systems. Traditional structure learning methods often require extensive observational data and incur high computational costs. Recent studies have explored using large language models (LLMs) for structure learning, but most treat LLMs as auxiliary tools for pre-processing or post-processing, leaving the core learning process data-driven. In this work, we propose a unified framework for Bayesian network structure discovery that places LLMs at the center, supporting both data-free and data-aware settings. In the data-free case, we introduce \textbf{PromptBN} to query LLMs with metadata and efficiently uncover valid probabilistic relationships. When observational data are available, we introduce \textbf{ReActBN}, which integrates the ReAct reasoning paradigm with structure scores such as the Bayesian Information Criterion (BIC) for iterative refinement. Unlike prior methods that offload refinement to external algorithms, our framework maintains the LLM actively in the loop throughout the discovery process. Experiments demonstrate that our method significantly outperforms both existing LLM-based approaches and traditional data-driven algorithms, particularly in the low- or no-data scenario. Code will be publicly available upon publication.

URL: https://openreview.net/forum?id=G4mrO8LVix

---

Title: Cost-Free Personalization via Information-Geometric Projection in Bayesian Federated Learning

Abstract: Bayesian Federated Learning (BFL) combines uncertainty modeling with decentralized training, enabling the development of personalized and reliable models in the presence of data heterogeneity and privacy constraints. Existing approaches typically rely on Markov Chain Monte Carlo (MCMC) sampling or variational inference, often incorporating personalization mechanisms to better adapt to the local data distributions. In this work, we propose an information-geometric projection framework for personalization in parametric BFL. By projecting the global model onto a neighborhood of the user's local model, our method enables a tunable trade-off between global generalization and local specialization. Under mild assumptions, we show that this projection step is equivalent to computing a barycenter in the statistical manifold, allowing us to derive closed-form solutions and achieve cost-free personalization. We apply the proposed approach within a variational learning setup using the Improved Variational Online Newton (IVON) optimizer and extend it to general aggregation schemes in BFL. Empirical evaluations under heterogeneous data distributions confirm that our method effectively balances global and local performance with minimal computational overhead.

URL: https://openreview.net/forum?id=9y0jCrxjDR

---

Title: A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

Abstract: Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer.
This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning that explores, revises, and aggregates.
The topology is crossed with the main objectives of the field, including traditional time series analysis, explanation and understanding, causal inference and decision making, and time series generation, while a compact tag set spans these axes and captures decomposition and verification, ensembling, tool use, knowledge access, multimodality, agent loops, and LLM alignment regimes.
Methods and systems are reviewed across domains, showing what each topology enables and where it breaks down in faithfulness or robustness, along with curated datasets, benchmarks, and resources that support study and deployment (with an accompanying repository at \url{https://anonymous.4open.science/r/Time-Series-Reasoning-Survey-TMLR/}).
Evaluation practices that keep evidence visible and temporally aligned are highlighted, and guidance is distilled on matching topology to uncertainty, grounding with observable artifacts, planning for shift and streaming, and treating cost and latency as design budgets.
We emphasize that reasoning structures must balance capacity for grounding and self-correction against computational cost and reproducibility, while future progress will likely depend on benchmarks that tie reasoning quality to utility and on closed-loop testbeds that trade off cost and risk under shift-aware, streaming, and long-horizon settings.
Taken together, these directions mark a shift from narrow accuracy toward reliability at scale, enabling systems that not only analyze but also understand, explain, and act on dynamic worlds with traceable evidence and credible outcomes.

URL: https://openreview.net/forum?id=mgMJ8ksKKA

---

Title: Correctness-Aware Knowledge Distillation for Enhanced Student Learning

Abstract: In real-world learning, students rely on their mentors for guidance but must also develop the ability to recognize and learn from their mentors' mistakes. Inspired by this mentor-critic dynamic, we propose Mentor-Critic Distillation (MCD), a novel framework for knowledge distillation in machine learning. Traditional distillation methods risk transferring both correct insights and errors from the mentor (teacher model) to the student model, which can hinder student performance. Notably, previous state-of-the-art approaches fail to account for scenarios where the teacher is incorrect, often leaving the student model vulnerable to inheriting these errors. To address this limitation, MCD introduces a weighted knowledge transfer mechanism that decouples the learning process based on the mentor's correctness. When the mentor model is correct, the student model follows the mentor's guidance with a large weight on knowledge transfer. However, when the mentor is incorrect, the student relies more on the ground truth but still learns inter-class relationships from the mentor, adjusting the weight toward task-specific losses such as cross-entropy. This mentor-critic approach ensures that the student model benefits from the mentor's expertise without inheriting its mistakes. We provide theoretical analysis proving that MCD strictly generalizes vanilla KD and guarantees reduced negative transfer. We evaluate our Mentor-Critic Distillation across diverse teacher-student configurations on benchmark datasets, including CIFAR-100, ImageNet, and MedMNIST. Notably, MCD requires no architectural modifications or additional parameters, making it a practical drop-in replacement for standard knowledge distillation. These results highlight MCD's effectiveness in optimizing knowledge transfer and its robustness across diverse domains and data regimes, particularly in data-scarce scenarios typical of specialized domains such as medical imaging.

URL: https://openreview.net/forum?id=XpRXmzd2sF

---

Title: Relative Geometry of Neural Forecasters: Linking Accuracy and Alignment in Learned Dynamics

Abstract: Neural networks can accurately forecast complex dynamical systems, yet how they internally represent underlying dynamics remains poorly understood. We study neural forecasters through the lens of representational alignment, introducing anchor-based, geometry-agnostic relative embeddings that remove rotational and scaling ambiguities in latent spaces. Applying this framework across seven canonical dynamical systems—ranging from periodic to chaotic—we reveal reproducible family-level structure: multilayer perceptrons align with other MLPs, recurrent networks with RNNs, while transformers and echo-state networks achieve strong forecasts despite weaker alignment. Alignment generally correlates with forecasting accuracy, yet high accuracy can coexist with low alignment.
Relative geometry thus provides a simple, reproducible foundation for comparing how model families internalize and represent dynamical structure.

URL: https://openreview.net/forum?id=t4stf5Gafz

---

Title: Facial Counterfactual Generation via Causal Mask-Guided Editing

Abstract: Generating counterfactual facial images is an important tool for interpretable machine learning, fairness analysis, and understanding the causal relationships among facial attributes. In this work, we propose a novel neuro-symbolic framework for causal editing, which integrates causal graph discovery, mask-guided counterfactual generation, and semantic interpretation to produce facial images that are both realistic and causally consistent. We first employ the Fast Causal Inference (FCI) algorithm to uncover latent causal relationships among facial attributes, enabling the identification of direct and indirect factors for target interventions. Using these causal graphs, we construct spatially informed masks that guide a DDPM-based generative model, ensuring that only regions relevant to the causal factors are modified. Finally, we leverage CLIP-based embeddings to provide logical, human-understandable explanations of the semantic changes in the counterfactuals. Experiments on CelebA and CelebA-HQ demonstrate that our approach produces high-fidelity counterfactuals, achieves superior performance on sparsity and realism metrics, and mitigates bias compared to state-of-the-art methods. This framework offers a principled approach to causally grounded, interpretable facial image editing.

URL: https://openreview.net/forum?id=ssamEGQj0C

---

Title: TextOCVP: Object-Centric Video Prediction with Language Guidance

Abstract: Understanding and forecasting future scene states is critical for autonomous agents to plan and act effectively in complex environments. Object-centric models, with structured latent spaces, have shown promise in modeling object dynamics and predicting future scene states, but often struggle to scale beyond simple synthetic datasets and to integrate external guidance, limiting their applicability in robotics. To address these limitations, we propose TextOCVP, an object-centric model for video prediction guided by textual descriptions. TextOCVP parses an observed scene into object representations, called slots, and utilizes a text-conditioned transformer predictor to forecast future object states and video frames. Our approach jointly models object dynamics and interactions while incorporating textual guidance, enabling accurate and controllable predictions. TextOCVP’s structured latent space offers a more precise control of the forecasting process, outperforming several video prediction baselines on two datasets. Additionally, we show that structured object-centric representations provide superior robustness to novel scene configurations, as well as improved controllability and interpretability, enabling more precise and understandable predictions. Code will be open-sourced upon acceptance.

URL: https://openreview.net/forum?id=7JEgXCyQgX

---

Title: Provable Domain Adaptation for Offline Reinforcement Learning with Limited Samples

Abstract: Offline reinforcement learning (RL) learns effective policies from a static target dataset. The performance of state-of-the-art offline RL algorithms notwithstanding, it relies on the size of the target dataset, and it degrades if limited samples in the target dataset are available, which is often the case in real-world applications. To address this issue, domain adaptation that leverages auxiliary samples from related source datasets (such as simulators) can be beneficial. However, establishing the optimal way to trade off the limited target dataset and the large-but-biased source dataset while ensuring provably theoretical guarantees remains an open challenge. To the best of our knowledge, this paper proposes the first framework that theoretically explores the impact of the weights assigned to each dataset on the performance of offline RL. In particular, we establish performance bounds and the existence of the optimal weight, which can be computed in closed form under simplifying assumptions. We also provide algorithmic guarantees in terms of convergence to a neighborhood of the optimum. Notably, these results depend on the quality of the source dataset and the number of samples in the target dataset. Our empirical results on the well-known offline Procgen benchmark substantiate the theoretical contributions in this work.

URL: https://openreview.net/forum?id=xog8ThcXwy

---

Title: From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation

Abstract: We study off-policy evaluation in the setting of contextual bandits, where we aim to evaluate a new policy using historical data that consists of contexts, actions and received rewards. This historical data typically does not faithfully represent action distribution of the new policy accurately. A common approach, inverse probability weighting (IPW), adjusts for these discrepancies in action distributions.
However, this method often suffers from high variance due to the probability being in the denominator.
The doubly robust (DR) estimator reduces variance through modeling reward but does not directly address variance from IPW.
In this work, we address the limitation of IPW by proposing a Nonparametric Weighting (NW) approach that constructs weights using a nonparametric model. Our NW approach achieves low bias like IPW but typically exhibits significantly lower variance.
To further reduce variance, we incorporate reward predictions — similar to the DR technique — resulting in the Model-assisted Nonparametric Weighting (MNW) approach. We show that MNW yields accurate value estimates when either the reward model or the behavior policy model is well specified. Extensive empirical comparisons show that our approaches consistently outperform existing techniques, achieving lower variance in value estimation while maintaining low bias.

URL: https://openreview.net/forum?id=RW6PY0AU3w

---

Title: MissNODAG: Differentiable Learning of Cyclic Causal Graphs from Incomplete Data

Abstract: Causal discovery in real-world systems, such as biological networks, is often complicated by feedback loops and incomplete data. Standard algorithms, which assume acyclic structures or fully observed data, struggle with these challenges. To address this gap, we propose MissNODAG, a differentiable framework for learning both the underlying cyclic causal graph and the missingness mechanism from partially observed data, including data *missing not at random*. Our framework integrates an additive noise model with an expectation-maximization procedure, alternating between imputing missing values and optimizing the observed data likelihood, to uncover both the cyclic structures and the missingness mechanism. We demonstrate the effectiveness of MissNODAG through synthetic experiments and an application to real-world gene perturbation data.

URL: https://openreview.net/forum?id=nNZXQ3Q0GP

---

Title: Diverse Image Priors for Black-box Data-free Knowledge Distillation

Abstract: Knowledge distillation (KD) is a well-known technique for effectively transferring knowledge from an expert network (teacher) to a smaller network (student) with little sacrifice in performance. However, most KD methods require extensive access to the teacher or even its original training set, which are unachievable due to intellectual property or security concerns. These challenges have inspired black-box data-free KD, in which only the teacher's top-1 predictions and no real data are available. While recent approaches tend to synthetic data, they largely overlook data diversity, which is crucial for effective knowledge transfer. We propose Diverse Image Priors Knowledge Distillation (DIP-KD) to address this problem. We first synthesize image priors --- semantically diverse synthetic images, then further optimize them to a diversity objective via contrastive learning, and finally extract soft knowledge to distill the student. We achieve state-of-the-art KD performance for the black-box data-free settings on eight image benchmarks. This is backed by our deep analysis, showing that data diversity is effectively improved, and how it facilitates KD performance. We publish the source code at https://osf.io/5mry8/?view_only=dee9e8fbcd114c34b45aa958a3aa32fa.

URL: https://openreview.net/forum?id=9biXMYLFXn

---

Title: Auditing Predictive Models for Intersectional Biases

Abstract: Predictive models that satisfy group fairness criteria in aggregate for members of a protected class, but do not guarantee subgroup fairness, could produce biased predictions for individuals at the intersection of two or more protected classes. To address this risk, we propose Conditional Bias Scan (CBS), an auditing framework for detecting intersectional biases in the outputs of classification models that may lead to disparate impact. CBS identifies the subgroup with the most significant bias against the protected class, compared to the equivalent subgroup in the non-protected class. The framework can audit for predictive biases using common group fairness definitions (separation and sufficiency) for both probabilistic and binarized predictions. We show through empirical evaluations that this methodology has significantly higher bias detection power compared to similar methods that audit for subgroup fairness. We then use this approach to detect statistically significant intersectional biases in the predictions of the COMPAS pre-trial risk assessment tool and a model trained on the German Credit data.

URL: https://openreview.net/forum?id=1JTnlHMSmO

---

Title: Re:Form --- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

Abstract: Existing informal language-based (e.g., human language) Large Language Models (LLMs) trained with Reinforcement Learning (RL) face a significant challenge: their verification processes, which provide crucial training signals, are neither reliable nor scalable. In fact, the prevalent large proprietary models could hardly generate verifiable programs. A promising yet largely uncharted alternative is formal language-based reasoning. Grounding LLMs in rigorous formal systems where generative models operate in formal language spaces (e.g., Dafny) enables the automatic and mathematically provable verification of their reasoning processes and outcomes. This capability is pivotal for achieving large-scale, reliable formal software verification. It is a common practice to employ human-annotated chain-of-thought and other human priors to induce the reasoning and coding capabilities of LLMs. Unfortunately, it becomes unacceptably all-consuming to provide such priors for supervising complex programming tasks. In this work, we systematically explore ways to reduce human priors with the formal language, Dafny, as the main environment for our pilot study. Our pipeline mainly relies on introducing an automatic and scalable data curation pipeline, and careful RL designs integrated with feedback from the formal language verifier. We introduce DafnyComp, a benchmark of compositional formal programs with auto-formalized specifications for specification reasoning. Our supervised fine-tuning (SFT) stage enables even small models (e.g., 0.5B) to generate syntactically valid and verifiable Dafny code, surpassing proprietary models. RL with regularization further improves performance, achieving stronger generalization to out-of-domain tasks and outperforming all strong baselines on the challenging DafnyComp benchmark. Anonymized code and models are available at https://github.com/ReFormDafny/ReForm and https://huggingface.co/ReFormDafny.

URL: https://openreview.net/forum?id=cAQmIS4GOe

---

Title: The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Abstract: The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the partially observable, temporally extended partially observable Markov decision process (POMDP) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.

URL: https://openreview.net/forum?id=RY19y2RI1O

---

Title: Modular Diffusion Policy Training: Decoupling and Recombining Guidance and Diffusion for Offline RL

Abstract: In classifier-free diffusion(CFD), the diffusion model and its guidance are typically learned jointly and applied jointly in the inference stage. Before the guidance has converged, it provides unstable or even misleading gradients, which leads to inefficiency and instability during the early stage of training. Such strict coupling not only leads to self-enforcing variance and biased errors but also prevents the guidance module from being reused across different diffusion models. We propose Guidance-First Diffusion Training (GFDT), which pretrains and freezes the guidance model before diffusion policy learning. GFDT reduces peak memory and computation by 38.1%, decreases diffusion training by 65.6% and 27.66%, and achieves up to 43.16\% and 60.98\% performance improvements on offline RL benchmarks. Beyond efficiency, we uncover a strong plug-and-play property: replacing the guidance module only at inference time can substantially improve stability. Cross-algorithm swaps (e.g., Implicit Q-Learning (IDQL) guidance for Diffusion Q-Learning (DQL) policies) perform comparably to the stronger of the two, despite never being co-trained. Our theoretical analysis shows that GFDT enables the convergence on an optimal guidance and theoretically proves that it speeds up the training. Also, we proved that plug-and-play remains valid as long as the guidance and the diffusion model are trained with the same data distribution. Limitations arising from dataset mismatch are analyzed in detail, which further underscores the necessity of distributional alignment. This work opens a new line of research by treating diffusion and guidance as modular units that can be recombined, rather than as a monolithic process, suggesting a paradigm that may guide the future development of diffusion-based reinforcement learning.

URL: https://openreview.net/forum?id=KJSvZwdPFd

---

Title: Concept-RidgeAIME: LLM-Guided Automatic Concept-Based Explanations via Ridge-Regularized Inverse Operators for Trustworthy AI

Abstract: Concept-based explanations overcome the limitations of low-level feature importance and focus on high-level, human-understandable concepts to explain the decision-making behind machine learning models. However, achieving model independence and the simultaneous presentation of global and local information within a single framework has been difficult. This study extends the concept of approximate inverse model explanations (AIME) and proposes Concept-RidgeAIME, which simultaneously obtains global and local explanations via concepts by utilizing a regularized linear approximate inverse mapping as its core. The proposed method learns a two-stage structure---an inverse operator mapping from the model output to the input and an inverse operator mapping from the concept to the input---only once. Subsequently, it efficiently calculates the contribution and ratio of concepts for any individual using simple matrix-vector operations. Without requiring access to internal representations or gradients, it presents global (concept importance ranking) and local (individual concept contributions) information within the same framework, thereby achieving model independence with low overhead. Using the global feature importance as a foundation, this study demonstrates a workflow in which a large language model automatically synthesizes rule concepts composed of normalization thresholds and one-hot equations, then validates the syntax and excludes zero/positive cases to ensure robustness.
Evaluations quantified the reconstructability (completeness) of black-box outputs and coverage (projection completeness) at the concept base level using tabular benchmarks (Adult, German Credit, and COMPAS). Stability and efficiency were verified using bootstrap confidence intervals and inference time (millisecond-level). Results showed that Concept-RidgeAIME demonstrated practical advantages over conventional concept-based methods (ConceptSHAP, CBM, and TCAV) and the application of generic SHAP to the concept space. These advantages are achieved by Concept-RidgeAIME through a model-independent implementation that requires no additional training and can handle global, local, and concept mappings in an integrated manner.

URL: https://openreview.net/forum?id=5X330pAMQV

---

Title: Model-diff: A Tool for Comparative Study of Language Models in the Input Space

Abstract: Comparing whether two large language models (LMs) make similar predictions -- such as perplexity -- across massive input spaces is crucial for real-world applications. Traditional analyses average benchmark scores over fixed datasets, masking per-input differences. We propose Model-diff, a framework that estimates the distribution of prediction differences between two LMs across a large, meaningful input space -- defined as the set of token sequences assigned low negative log-likelihood (NLL). Model-diff leverages sampling-based histogram statistics to efficiently quantify output differences without exhaustive enumeration. Experiments reveal, for the first time, quantitative divergences between LMs in their low-NLL regions, providing a scalable tool for model comparison and diagnostic analysis.

URL: https://openreview.net/forum?id=gGZi3blMWA

---

Title: Offline changepoint localization using a matrix of conformal p-values

Abstract: Changepoint localization is the problem of estimating the index at which a change occurred in the data generating distribution of an ordered list of data, or declaring that no change occurred. We present the broadly applicable MCP algorithm, which uses a matrix of conformal p-values to produce a confidence interval for a (single) changepoint under the mild assumption that the pre-change and post-change distributions are each exchangeable. We prove a novel conformal Neyman-Pearson lemma, motivating practical classifier-based choices for our conformal score function. Finally, we exemplify the MCP algorithm on a variety of synthetic and real-world datasets, including using black-box pre-trained classifiers to detect changes in sequences of images, text, and accelerometer data.

URL: https://openreview.net/forum?id=bo2WlznUOc

---

Title: Enhancing Semi-supervised Learning with Zero-shot Pseudolabels

Abstract: The high cost of data labeling presents a major barrier to deploying machine learning systems at scale.
Semi-supervised learning (SSL) mitigates this challenge by utilizing unlabeled data alongside limited labeled examples, while the emergence of foundation models (FMs) offers powerful zero-shot capabilities that can further reduce labeling cost.
However, directly fine-tuning large FMs is often impractical in resource-constrained settings, and naïvely using their pseudo-labels for unlabeled data can degrade performance due to its unreliablity or domain mismatch with target task.
In this work, we introduce ZeroMatch, a novel SSL framework that integrates knowledge distillation with consistency-based learning to jointly leverage labeled data, unlabeled data, and pseudo-labels from FMs.
ZeroMatch trains a compact student model and access FMs only through inference services, making it suitable for low-resource environments such as personal devices with limited compute. Experiments on six vision and language classification benchmarks show that ZeroMatch consistently outperforms standard SSL and zero-shot augmented methods, demonstrating its effectiveness and robustness across a range of foundation model qualities.

URL: https://openreview.net/forum?id=WB05Doi29V

---

Title: Task-Specific Exploration in Meta-Reinforcement Learning via Task Reconstruction

Abstract: Reinforcement learning trains policies specialized for a single task. Meta-reinforcement learning (meta-RL) improves upon this by leveraging prior experience to train policies for few-shot adaptation to new tasks. However, existing meta-RL approaches often struggle to explore and learn tasks effectively. We introduce a novel meta-RL algorithm for learning to learn task-specific, sample-efficient exploration policies. We achieve this through task reconstruction, an original method for learning to identify and collect small but informative datasets from tasks. To leverage these datasets, we also propose learning a meta-reward that encourages policies to learn to adapt. Empirical evaluations demonstrate that our algorithm achieves higher returns than existing meta-RL methods. Additionally, we show that even with full task information, adaptation is more challenging than previously assumed. However, policies trained with our meta-reward adapt to new tasks successfully.

URL: https://openreview.net/forum?id=VRRapVcaJH

---

Title: Handling Missing Data in Downstream Tasks With Distribution-Preserving Guarantees

Abstract: Missing feature values are a significant hurdle for downstream machine-learning tasks such as classification. However, imputation methods for classification might be time-consuming for high-dimensional data, and offer few theoretical guarantees on the preservation of the data distribution and imputation quality, especially for not-missing-at-random mechanisms. First, we propose an imputation approach named F3I based on the iterative improvement of a K-nearest neighbor imputation, where neighbor-specific weights are learned through the optimization of a novel concave, differentiable objective function related to the preservation of the data distribution on non-missing values. F3I can then be chained to and jointly trained with any classifier architecture. Second, we provide a theoretical analysis of imputation quality and data distribution preservation by F3I for several types of missing mechanisms. Finally, we demonstrate the superior performance of F3I on several imputation and classification tasks, with applications to drug repurposing and handwritten-digit recognition data.

URL: https://openreview.net/forum?id=7Wj1rZ7mJ4

---

Title: PrismBench: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search

Abstract: The rapid advancement of LLMs' code generation capabilities is outpacing traditional evaluation methods. Static benchmarks fail to capture the depth and breadth of LLM capabilities and eventually become obsolete, while most dynamic approaches either rely too heavily on LLM-based evaluation or remain constrained by predefined test sets. To address these issues, we introduce PrismBench, a multi-agent, dynamic benchmarking framework designed to systematically expose and analyze LLM failure modes in code generation tasks. We formulate evaluation as a Markov Decision Process over a structured tree of coding challenges, leveraging a customized Monte Carlo Tree Search algorithm to traverse this tree and discover high-failure scenarios. Our multi-agent setup orchestrates task generation, model response, and analysis, enabling scalable assessment across diverse coding challenges. Additionally, we propose metrics that combine structural traversal patterns with performance across different tasks and difficulty levels to enable diagnostic and systematic comparison of LLMs' performance. We conduct extensive experiments on eight state-of-the-art LLMs and analyze how model architecture and scale influence code generation performance across varying coding tasks. All code, evaluation trees, and a public leaderboard are available at https://prismbench.github.io/Demo/

URL: https://openreview.net/forum?id=O0bsC6FDly

---

Title: Guiding Skill Discovery with Foundation Models

Abstract: Learning diverse skills without hand-crafted reward functions could accelerate reinforcement learning in downstream tasks. However, existing skill discovery methods focus solely on maximizing the diversity of skills without considering human preferences, which leads to undesirable behaviors and possibly dangerous skills. For instance, a cheetah robot trained using previous methods learns to roll in all directions to maximize skill diversity, whereas we would prefer it to run without flipping or entering hazardous areas. In this work, we propose a Foundation model Guided (FoG) skill discovery method, which incorporates human intentions into skill discovery through foundation models. Specifically, FoG extracts a score function from foundation models to evaluate states based on human intentions, assigning higher values to desirable states and lower to undesirable ones. These scores are then used to re-weight the rewards of skill discovery algorithms. By optimizing the re-weighted skill discovery rewards, FoG successfully learns to eliminate undesirable behaviors, such as flipping or rolling, and to avoid hazardous areas in both state-based and pixel-based tasks. Interestingly, we show that FoG can discover skills involving behaviors that are difficult to define. Interactive visualisations are available from https://sites.google.com/view/submission-fog.

URL: https://openreview.net/forum?id=3Fp6vwAC6n

---

Title: KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities

Abstract: Recent advances in text-to-image generation have improved the quality of synthesized images, but evaluations mainly focus on aesthetics or alignment with text prompts. Thus, it remains unclear whether these models can accurately represent a wide variety of realistic visual entities. To bridge this gap, we propose Kitten, a benchmark for Knowledge-InTensive image generaTion on real-world ENtities. Using Kitten, we conduct a systematic study of recent text-to-image models, retrieval-augmented models, and unified understanding and generation models, focusing on their ability to generate real-world visual entities such as landmarks and animals. Analyses using carefully designed human evaluations, automatic metrics, and MLLMs as judges show that even advanced text-to-image and unified models fail to generate accurate visual details of entities. While retrieval-augmented models improve entity fidelity by incorporating reference images, they tend to over-rely on them and struggle to create novel configurations of the entities in creative text prompts.

URL: https://openreview.net/forum?id=wejaKS9Ps0

---

Title: Calibration Enhanced Decision Maker: Towards Trustworthy Sequential Decision-Making with Large Sequence Models

Abstract: Offline deep reinforcement learning (offline DRL) has attracted considerable attention across various domains due to its ability to learn effective policies without direct environmental interaction. Although highly effective, the trustworthiness of agent remains a paramount concern within the community. Offline DRL can be categorized into three principal paradigms: model-based algorithms, model-free algorithms, and trajectory optimization. While extant research predominantly concentrates on calibration enhancement of model-based and model-free algorithms, calibration of trajectory optimization remains a comparatively underexplored avenue of investigation. In this paper, we pioneer the concept of Expected Agent Calibration Error (EACE), a novel metric designed to assess agent calibration. Furthermore, we rigorously prove its theoretical relationship to the state-action marginal distribution distance. Subsequently, we introduce the Calibration Enhanced Decision Maker (CEDM), which employs a binning executor to process feature distribution histograms as input for the large sequence model, thereby minimizing the state-action marginal distribution distance and enhancing the agent's calibration. A series of in-depth case studies are undertaken to examine CEDM, with its application examined across Decision Transformer, Decision ConvFormer, and Decision Mamba. Empirical results substantiate the robustness of EACE and demonstrate the effectiveness of CEDM in enhancing agent calibration, thereby offering valuable insights for future research on trustworthy sequential decision-making.

URL: https://openreview.net/forum?id=b6WcxPEb48

---

Title: Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks

Abstract: We introduce ROAR (Robust Object Removal and Re-annotation), a scalable framework for privacy-preserving dataset obfuscation that removes sensitive objects instead of modifying them. Designed for practical deployment, our method integrates instance segmentation with generative inpainting to eliminate identifiable entities while preserving scene integrity. Extensive evaluations on 2D COCO-based object detection show that ROAR achieves 87.5% of baseline average precision (AP), whereas image dropping achieves only 74.2%, highlighting the advantage of scrubbing in preserving dataset utility. In NeRF-based 3D reconstruction, our method incurs a PSNR loss of at most 1.66,dB while maintaining SSIM and improving LPIPS, demonstrating superior perceptual quality. ROAR follows a structured pipeline of detection, inpainting-based removal, re-annotation, and evaluation. We systematically evaluate the privacy-utility trade-off across both 2D and 3D tasks, showing that object removal offers a more effective balance than traditional methods. Our findings establish ROAR as a practical privacy framework, achieving strong guarantees with minimal performance trade-offs. The results highlight challenges in generative inpainting, occlusion-robust segmentation, and task-specific scrubbing, laying the groundwork for real-world privacy-preserving vision systems.

URL: https://openreview.net/forum?id=RVht55LRWP

---

Title: Beyond Convexity: Proximal-Perturbed Lagrangian Methods for Efficient Functional Constrained Optimization

Abstract: Non-convex functional constrained optimization problems have gained substantial attention in machine learning and data science, addressing broad requirements that typically go beyond the often performance-centric objectives. An influential class of algorithms for functional constrained problems is the class of primal-dual methods which has been extensively analyzed for convex problems. Nonetheless, the investigation of their efficacy for non-convex problems is under-explored. This paper develops a primal-dual algorithmic framework for solving such non-convex problems. This framework is built upon a novel form of the Lagrangian function, termed the {\em Proximal-Perturbed Augmented Lagrangian}, which enables the development of simple first-order algorithms that converge to a stationary solution under mild conditions. Notably, we study this framework under both non-smoothness and smoothness of the constraint function and provide three key contributions: (i) a single-loop algorithm that does not require the continuous adjustment of the penalty parameter to infinity; (ii) a non-asymptotic iteration complexity of $\widetilde{\mathcal{O}}(1/\epsilon^2)$; and (iii) extensive experimental results demonstrating the effectiveness of the proposed framework in terms of computational cost and performance, outperforming related approaches that use regularization (penalization) techniques and/or standard Lagrangian relaxation across diverse non-convex problems.

URL: https://openreview.net/forum?id=I1VknjcXKI

---

Title: Dual-Phase Continual Learning: Supervised Adaptation Meets Unsupervised Retention

Abstract: Foundational vision-language models (VLMs) excel across diverse tasks, but adapting them to new domains without forgetting prior knowledge remains a critical challenge. Continual Learning (CL) addresses this challenge by enabling models to learn sequentially from new data while mitigating the forgetting of prior information, typically under supervised settings involving label shift. Nonetheless, abrupt distribution shifts can still cause substantial forgetting, potentially nullifying the benefits of supervised updates, especially when storing or replaying past data is infeasible. In this work, we propose leveraging unlabeled test-time data in an unsupervised manner to reinforce prior task performance without requiring replay or stored examples. Unlike traditional Test-Time Adaptation (TTA), which primarily focuses on domain shift or corruption, our method improves performance on earlier tasks by exploiting representative test samples encountered during deployment. We introduce a simple teacher-student framework with gradient-based sparse parameter updates, and show that it effectively mitigates forgetting in class-incremental CL for VLMs, offering a memory-free alternative to episodic replay with strong empirical results.

URL: https://openreview.net/forum?id=GFrHdXzZwo

---

Title: On Sketching for Gaussian Process Regression with New Statistical Guarantees

Abstract: The cubic computational complexity of Gaussian Process Regression (GPR) with respect to the number of data points is a major bottleneck to its scalability. While various approaches have been proposed to address this, few come with provable guarantees. Inspired by the success of ridge leverage score based sampling in scaling kernel ridge regression~\cite{mahoney_main}, we propose a sketch-based approximation for GPR using ridge leverage scores. We provide theoretical guarantees on the approximation of the predictive mean, predictive variance, and negative log-marginal likelihood in this setting. To the best of our knowledge, these are the first theoretical guarantees for approximating the predictive variance and negative log-marginal likelihood of GPR using ridge leverage score sampling. We further show that a carefully constructed sketch of the kernel matrix preserves key statistical properties of the full GPR model with high probability. Our theoretical results are supported by empirical evaluations on real-world datasets, demonstrating strong trade-offs between accuracy and efficiency.

URL: https://openreview.net/forum?id=NmwrhyuVEu

---

Title: On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

Abstract: On-policy reinforcement learning (RL) algorithms are typically characterized as algorithms that perform policy updates using i.i.d.\@ trajectories collected by the agent's current policy. However, after observing only a finite number of trajectories, such on-policy sampling may produce data that fails to match the expected on-policy data distribution. This \textit{sampling error} leads to high-variance gradient estimates that yield data inefficient on-policy learning. Recent work in the policy evaluation setting has shown that non-i.i.d.\@, off-policy sampling can produce data with lower sampling error w.r.t. the expected on-policy distribution than on-policy sampling can produce~\citep{zhong2022robust}. Motivated by this observation, we introduce an adaptive, off-policy sampling method to reduce sampling error during on-policy policy gradient RL training. Our method, Proximal Robust On-Policy Sampling (PROPS), reduces sampling error by collecting data with a \textit{behavior policy} that increases the probability of sampling actions that are under-sampled w.r.t. the current policy. We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) increases the data efficiency of on-policy policy gradient algorithms.

URL: https://openreview.net/forum?id=nCoyFp8uO1

---

Title: A Reproducibility Study of Counterfactual Explanations for Image Classification

Abstract: Counterfactual explanations have gained traction in recent years due to their contrastive and potentially actionable nature: moving an outcome from the original class to an alternative target class. Generating plausible and accurate counterfactuals remains challenging.
We highlight two underexplored but critical factors influencing counterfactual quality for image classifiers: the neural network architecture and the chosen target class. This work presents a comprehensive empirical evaluation of multiple counterfactual explanation methods across diverse neural architectures and all possible target classes on the MNIST and CIFAR-10 datasets. Our results show that performance can vary substantially across architectures and targets, an aspect often overlooked in prior evaluations. To better assess counterfactual explanation plausibility, we introduce a novel evaluation method based on Moran’s index, a spatial autocorrelation metric. This allows us to systematically identify and exclude structurally implausible counterfactuals that existing metrics may overlook. We find that counterfactual explanation methods often fail to generate counterfactual explanations for intended target classes, due to factors such as timeouts, restrictive search spaces, or implementation issues. Furthermore, our analysis demonstrates that evaluating explanations on only one target class or architecture provides an incomplete and potentially misleading picture of performance. Additionally, we show that different plausibility metrics do not consistently agree, emphasising the need for more robust evaluation frameworks. In summary, we (i) identify architecture and target class as key overlooked dimensions in counterfactual explanation performance, (ii) propose a novel plausibility assessment method using Moran’s index, and (iii) provide actionable insights for the development and evaluation of more generalisable counterfactual explanation methods.

URL: https://openreview.net/forum?id=4Ldi6HkdTc

---

Title: Variational Visual Question Answering for Uncertainty-Aware Selective Prediction

Abstract: Despite remarkable progress in recent years, vision language models (VLMs) remain prone to overconfidence and hallucinations on tasks such as Visual Question Answering (VQA) and Visual Reasoning. Bayesian methods can potentially improve reliability by helping models selectively predict, that is, models respond only when they are sufficiently confident. Unfortunately, Bayesian methods are often assumed to be costly and ineffective for large models, and there exists little evidence to show otherwise for multimodal applications. Here, we show the effectiveness and competitive edge of variational Bayes for selective prediction in VQA for the first time. We build on recent advances in variational methods for deep learning and propose an extension called "Variational VQA". This method improves calibration and yields significant gains for selective prediction on VQA and Visual Reasoning, particularly when the error tolerance is low (≤ 1%). Often, just one posterior sample can yield more reliable answers than those obtained by models trained with AdamW. In addition, we propose a new risk-averse selector that outperforms standard sample averaging by considering the variance of predictions. Overall, we present compelling evidence that variational learning is a viable option to make large VLMs safer and more trustworthy.

URL: https://openreview.net/forum?id=jtnMIbJIso

---

Title: Learning to Defer with an Uncertain Rejector via Conformal Prediction

Abstract: Learning to defer (L2D) allows prediction tasks to be allocated to a human or machine decision maker, thus getting the best of both’s abilities. This allocation decision crucially depends on a ‘rejector’ function. In practice, the rejector could be poorly fit or otherwise misspecified. In this work, we perform uncertainty quantification for the rejector sub-component of the L2D framework. We use conformal prediction to allow the rejector to output prediction sets or intervals of a user-defined confidence level (with distribution-free guarantees), instead of just the binary outcome of ‘defer’ or not. On tasks ranging from image to hate speech classification, we demonstrate that the uncertainty in the rejector translates to safer decisions via two forms of selective prediction

URL: https://openreview.net/forum?id=SZQJ8K2DUe

---

Title: Benchmarking Missing Data Imputation Methods in Socioeconomic Surveys

Abstract: Missing data imputation is a core challenge in socioeconomic surveys, where data is often longitudinal, hierarchical, high-dimensional, not independent and identically distributed, and missing under complex mechanisms. Socioeconomic datasets like the Consumer Pyramids Household Survey (CPHS)-the largest continuous household survey in India since 2014, covering 174,000 households-highlight the importance of robust imputation, which can reduce survey costs, preserve statistical power, and enable timely policy analysis. This paper systematically evaluates these methods under three missingness mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR), across five missingness ratios ranging from 10% to 50%. We evaluate imputation performance on both continuous and categorical variables, assess the impact on downstream tasks, and compare the computational efficiency of each method. Our results indicate that classical machine learning methods such as MissForest and HyperImpute remain strong baselines with favorable trade-offs between accuracy and efficiency, while deep learning methods perform better under complex missingness patterns and higher missingness ratios, but face scalability challenges. We ran experiments on CPHS and multiple synthetic survey datasets, and found consistent patterns across them. Our framework aims to provide a reliable benchmark for structured socioeconomic surveys, and addresses the critical gap in reproducible, domain-specific evaluation of imputation methods. The open-source code is provided.

URL: https://openreview.net/forum?id=HLhi9xhRw6

---

Title: A Monte Carlo Framework for Calibrated Uncertainty Estimation in Sequence Prediction

Abstract: Probabilistic prediction of sequences from images and other high-dimensional data remains a key challenge, particularly in safety-critical domains. In these settings, it is often desirable to quantify the uncertainty associated with the prediction (instead of just determining the most likely sequence, as in language modeling). In this paper, we consider a Monte Carlo framework to estimate probabilities and confidence intervals associated with sequences. The framework uses a Monte Carlo simulator, implemented as an autoregressively trained neural network, to sample sequences conditioned on an image input. We then use these samples to estimate probabilities and confidence intervals. Experiments on synthetic and real data show that the framework produces accurate discriminative predictions, but can suffer from miscalibration. To address this shortcoming, we propose a time-dependent regularization method, which produces calibrated predictions.

URL: https://openreview.net/forum?id=sJE59flFC1

---

Title: Diffusion posterior sampling for simulation-based inference in tall data settings

Abstract: Identifying the parameters of a non-linear model that best explain observed data is a core task across scientific fields. When such models rely on complex simulators, evaluating the likelihood is typically intractable, making traditional inference methods such as MCMC inapplicable. Simulation-based inference (SBI) addresses this by training deep generative models to approximate the posterior distribution over parameters using simulated data. In this work, we consider the tall data setting, where multiple independent observations provide additional information, allowing sharper posteriors and improved parameter identifiability.
Building on the flourishing score-based diffusion literature, F-NPSE (Geffner et al., 2023) estimates the tall data posterior by composing individual scores from a neural network trained only for a single context observation. This enables more flexible and simulation-efficient inference than alternative approaches for tall datasets in SBI.
However, it relies on costly Langevin dynamics during sampling. We propose a new algorithm that eliminates the need for Langevin steps by explicitly approximating the diffusion process of the tall data posterior. Our method retains the advantages of compositional score-based inference while being significantly faster and more stable than F-NPSE. We demonstrate its improved performance on toy problems and standard SBI benchmarks, and showcase its scalability by applying it to a complex real-world model from computational neuroscience.

URL: https://openreview.net/forum?id=cdhfoS6Gyo

---

Title: From Words To Rewards: Leveraging Natural Language For Reinforcement Learning

Abstract: We explore the use of natural language to specify rewards in Reinforcement Learning with Human Feedback (RLHF). Unlike traditional approaches that rely on simplistic preference feedback, we harness Large Language Models (LLMs) to translate rich text feedback into state-level labels for training a reward model. Our empirical studies with human participants demonstrate that our method accurately approximates the reward function and achieves significant performance gains with fewer interactions than baseline methods.

URL: https://openreview.net/forum?id=Gbx0pLANdf

---

Title: On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds

Abstract: Regularization, whether explicit in terms of a penalty in the loss or implicit in the choice of algorithm, is a cornerstone of modern machine learning. Indeed, controlling the complexity of the model class is particularly important when data is scarce, noisy or contaminated, as it translates a statistical belief on the underlying structure of the data. This work investigates the question of how to choose the regularization norm $\lVert \cdot \rVert$ in the context of high-dimensional adversarial training for binary classification. To this end, we first derive an exact asymptotic description of the robust, regularized empirical risk minimizer for various types of adversarial attacks and regularization norms (including non-$\ell_p$ norms). We complement this analysis with a uniform convergence analysis, deriving bounds on the Rademacher Complexity for this class of problems. Leveraging our theoretical results, we quantitatively characterize the relationship between perturbation size and the optimal choice of $\lVert \cdot \rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.

URL: https://openreview.net/forum?id=vkmvuranbm

---

Title: Differentially private and decentralized randomized power method

Abstract: The randomized power method has gained significant interest due to its simplicity and efficient handling of large-scale spectral analysis and recommendation tasks. However, its application to large datasets containing personal user information (e.g., web interactions, search history, personal tastes) raises critical privacy problems. This paper addresses these issues by proposing enhanced privacy-preserving variants of the method. First, we propose a variant that reduces the variance of the noise required in current techniques to achieve Differential Privacy (DP). More precisely, we modify the algorithm and privacy analysis so that the Gaussian noise variance no longer grows linearly with the target rank, achieving the same $(\varepsilon,\delta)$‑DP guarantees with lower noise variance. Second, we adapt our method to a decentralized framework in which data is distributed among multiple user devices, strengthening privacy guarantees with no accuracy penalty and a low computational and communication overhead. Our results also include the provision of tighter convergence bounds for both the centralized and decentralized versions, and an empirical comparison with previous work using real recommendation datasets.

URL: https://openreview.net/forum?id=TkzhY8RB2K

---

Title: CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning

Abstract: Exploration remains a fundamental challenge in reinforcement learning, as many existing methods either lack theoretical guarantees or fall short in practical effectiveness. In this paper, we propose CAE, i.e., the Critic as an Explorer, a lightweight approach that repurposes the value networks in standard deep RL algorithms to drive exploration, without introducing additional parameters. CAE leverages multi-armed bandit techniques combined with a tailored scaling strategy, enabling efficient exploration with provable sub-linear regret bounds and strong empirical stability. Remarkably, it is simple to implement, requiring only about 10 lines of code. For complex tasks where learning reliable value networks is difficult, we introduce CAE+, an extension of CAE that incorporates an auxiliary network. CAE+ increases the parameter count by less than 1% while preserving implementation simplicity, adding roughly 10 additional lines of code. Extensive experiments on MuJoCo, MiniHack, and Habitat validate the effectiveness of CAE and CAE+, highlighting their ability to unify theoretical rigor with practical efficiency.

URL: https://openreview.net/forum?id=54MOD02xC2

---

Title: FedLog: Personalized Federated Classification with Less Communication and More Flexibility

Abstract: Federated representation learning (FRL) aims to learn personalized federated models with effective feature extraction from local data. FRL algorithms that share the majority of the model parameters face significant challenges with huge communication overhead. This overhead stems from the millions of neural network parameters and slow aggregation progress of the averaging heuristic. To reduce the overhead, we propose FedLog, which shares sufficient data summaries instead of raw model parameters. The data summaries encode minimal sufficient statistics of an exponential family, and Bayesian inference is utilized for global aggregation. FedLog helps reduce message sizes and communication frequency. We prove that the shared message is minimal and theoretically analyze the convergence rate of FedLog. To further ensure formal privacy guarantees, we extend FedLog with the differential privacy framework. Empirical results demonstrate high learning accuracy with low communication overhead of our method.

URL: https://openreview.net/forum?id=7Hwk0bvvKn

---

Title: Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

Abstract: We study offline off-dynamics reinforcement learning (RL) to utilize data from an easily accessible source domain to enhance policy learning in a target domain with limited data. Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on Decision Transformer (DT) type frameworks, which can predict actions conditioned on desired return guidance and complete trajectory history. Previous works address the dynamics shift problem by augmenting the reward in the trajectory from the source domain to match the optimal trajectory in the target domain. However, this strategy can not be directly applicable in RCSL owing to (1) the unique form of the RCSL policy class, which explicitly depends on the return, and (2) the absence of a straightforward representation of the optimal trajectory distribution. We propose the Return Augmented (REAG) method for DT type frameworks, where we augment the return in the source domain by aligning its distribution with that in the target domain. We provide the theoretical analysis demonstrating that the RCSL policy learned from REAG achieves the same level of suboptimality as would be obtained without a dynamics shift. We introduce two practical implementations $REAG^{∗}_{Dara}$ and $REAG^{∗}_{MV}$ respectively. Thorough experiments on D4RL datasets and various DT-type baselines demonstrate that our methods consistently enhance the performance of DT type frameworks in off-dynamics RL.

URL: https://openreview.net/forum?id=QDVOr5J9Xp

---

Title: Adapting to Any Bit-Width: Channel-Wise Mixed-Precision Quantization for LLMs

Abstract: Large Language Models (LLMs) have demonstrated remarkable success across a wide range of language tasks, but their deployment on edge devices remains challenging due to the substantial memory requirements imposed by their large parameter sizes. Weight-only quantization presents a promising solution to reduce the memory footprint of LLMs. However, existing approaches primarily focus on integer-bit quantization, limiting their adaptability to fractional-bit quantization tasks and preventing the full utilization of available storage space on devices. In this paper, we introduce Channel-Wise Mixed-Precision Quantization (CMPQ), a novel mixed-precision quantization method that allocates quantization precision in a channel-wise pattern based on activation distributions. By assigning different precision levels to different weight channels, CMPQ can adapt to \textit{any} bit-width constraint. CMPQ employs a non-uniform quantization strategy and incorporates two outlier extraction techniques that collaboratively preserve the critical information, thereby minimizing the quantization loss. Experiments on nine different LLMs demonstrate that CMPQ not only enhances performance in integer-bit quantization tasks but also achieves significant performance gains with a modest increase in memory usage by performing in a mixed-precision way. CMPQ represents an adaptive and effective approach to LLM quantization, offering substantial benefits across diverse device capabilities.

URL: https://openreview.net/forum?id=1t6sEhdLxf

---

Title: VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

Abstract: Multimodal embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering over different modalities. However, existing multimodal embeddings like VLM2Vec, E5-V, GME are predominantly focused on natural images, with limited support for other visual forms such as videos and visual documents. This restricts their applicability in real-world scenarios, including AI agents, retrieval-augmented generation (RAG) systems, and recommendation. To close this gap, we propose VLM2Vec-V2, a unified framework for learning embeddings across diverse visual forms. First, we introduce MMEB-V2, a comprehensive benchmark that extends MMEB with five new task types: visual document retrieval, video retrieval, temporal grounding, video classification and video question answering -- spanning text, image, video, and visual document inputs. Next, we train VLM2Vec-V2, a general-purpose embedding model that supports text, image, video, and visual document inputs. Extensive experiments show that VLM2Vec-V2 achieves strong performance not only on the newly introduced video and document retrieval tasks, but also improves over prior baselines on the original image benchmarks. Through extensive evaluation, our study offers insights into the generalizability of various multimodal embedding models and highlights effective strategies for unified embedding learning, laying the groundwork for more scalable and adaptable representation learning in both research and real-world settings.

URL: https://openreview.net/forum?id=TpU38jbKIJ

---

Title: Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation

Abstract: Knowledge distillation compresses a larger neural model (teacher) into smaller, faster student models by training them to match teacher outputs. However, the internal computational transformations that occur during this process remain poorly understood. We apply techniques from mechanistic interpretability to analyze how internal circuits, representations, and activation patterns differ between teachers and students. Focusing on GPT2 and its distilled counterpart DistilGPT2, and generalizing our findings to both bidirectional architectures and larger model pairs, we find that student models can reorganize, compress, and discard teacher components, often resulting in a stronger reliance on fewer individual components. To quantify functional alignment beyond output similarity, we introduce an alignment metric based on influence-weighted component similarity, validated across multiple tasks. Our findings reveal that while knowledge distillation preserves broad functional behaviors, it also causes significant shifts in internal computation, with important implications for the robustness and generalization capacity of distilled models.

URL: https://openreview.net/forum?id=S1KJE2ZW64

---

Title: Goal-Conditioned Reinforcement Learning from Sub-Optimal Data on Metric Spaces

Abstract: We study the problem of learning optimal behavior from sub-optimal datasets for goal-conditioned offline reinforcement learning under sparse rewards, invertible actions and deterministic transitions. To mitigate the effects of \emph{distribution shift}, we propose MetricRL, a method that combines metric learning for value function approximation with weighted imitation learning for policy estimation. MetricRL avoids conservative or behavior-cloning constraints, enabling effective learning even in severely sub-optimal regimes. We introduce distance monotonicity as a key property linking metric representations to optimality and design an objective that explicitly promotes it. Empirically, MetricRL consistently outperforms prior state-of-the-art goal-conditioned RL methods in recovering near-optimal behavior from sub-optimal offline data.

URL: https://openreview.net/forum?id=auMNJALOWx

---

Title: When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models

Abstract: Large Language Models (LLMs) are known for their performance, but we uncover a significant structural inefficiency: a phenomenon we term attention collapse. In many pre-trained decoder-style LLMs, the attention matrices in deeper layers degenerate, collapsing to near rank-one structures. These underutilized layers, which we call lazy layers, are redundant and impair model efficiency. To address this, we introduce Inheritune, a simple yet powerful training recipe designed to build smaller, stronger language models. Inheritune initializes a compact model by inheriting the potent early layers from a larger pre-trained model and then progressively trains and expands it. Our experiments on various models, including the GPT-2 family, demonstrate that models trained with Inheritune can match or even surpass the performance of their larger counterparts, despite having significantly fewer layers. This work presents a novel path toward model compression by design, enabling the creation of compact, yet highly performant language models.

URL: https://openreview.net/forum?id=2zQn0bUoPf

---

Title: Policy Learning with a Language Bottleneck

Abstract: Modern AI systems such as self-driving cars and game-playing agents achieve superhuman
performance. But they often lack human-like generalization, interpretability, and inter-
operability with human users. This paper introduces *Policy Learning with a Language
Bottleneck* (PLLB), a framework enabling AI agents to generate linguistic rules that capture
the high-level strategies underlying rewarding behaviors. PLLB alternates between a *rule
generation* step guided by language models, and an *update* step where agents learn new
policies guided by rules. Crucially, PLLB enables this kind of language-guided learning
even when a natural language rule is insufficient to completely describe the target policy.
Across five diverse tasks, including a two-player signaling game, maze navigation, image
reconstruction, and robot grasp planning, we show that PLLB learns more interpretable
and generalizable behaviors than standard policy learning methods. In three additional
human subject studies, we show that show the learned rules significantly improve human
task performance, enabling more effective human-AI coordination

URL: https://openreview.net/forum?id=sK8uEqzQPv

---

Title: Reasoning-Driven Synthetic Data Generation and Evaluation

Abstract: Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and time-consuming, leading model builders to increasingly consider synthetic data as a scalable alternative. However, existing synthetic data generation methods often rely on manual prompts, evolutionary algorithms, or extensive seed data from the target distribution – limiting their scalability, explainability, and control. In this paper, we introduce Simula: a novel reasoning-driven framework for data generation and evaluation. It employs a seedless, agentic approach to generate synthetic datasets at scale, allowing users to define desired dataset characteristics through an explainable and controllable process, enabling fine-grained resource allocation. We show the efficacy of our approach on a variety of datasets, rigorously testing both intrinsic and downstream properties. Our work (1) offers guidelines for synthetic data mechanism design, (2) provides insights into generating and evaluating synthetic data at scale, and (3) unlocks new opportunities for developing and deploying AI in domains where data scarcity or privacy concerns are paramount.

URL: https://openreview.net/forum?id=NALsdGEPhB

---

Title: ParaBlock: Communication-Computation Parallel Block Coordinate Federated Learning for Large Language Models

Abstract: Federated learning (FL) has been extensively studied as a privacy-preserving training paradigm. Recently, federated block coordinate descent scheme has become a popular option in training large-scale models, as it allows clients to train only a subset of the model locally instead of the entire model. However, in the era of large language models (LLMs), even a single block can contain a significant number of parameters, posing substantial communication latency, particularly for resource-constrained clients. To address this challenge in federated training/fine-tuning LLMs, we propose ParaBlock, a novel approach that establishes two parallel threads for communication and computation to enhance communication efficiency. We theoretically prove that the proposed ParaBlock achieves the same convergence rate as the standard federated block coordinate descent methods. Empirical evaluations on fine-tuning LLMs on general instruction following and mathematical reasoning confirm that ParaBlock not only maintains strong performance but also significantly improves communication efficiency.

URL: https://openreview.net/forum?id=Hnf7eCdBeV

---

Title: Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers

Abstract: Visual Prompt Tuning (VPT) of pre-trained Vision Transformers (ViTs) has proven highly effective as a parameter-efficient fine-tuning technique for adapting large models to downstream tasks with limited data. Its parameter efficiency makes it particularly suitable for Federated Learning (FL), where both communication and computation budgets are often constrained. However, global prompt tuning struggles to generalize across heterogeneous clients, while personalized tuning overfits to local data and lacks generalization. We propose PEP-FedPT (Prompt Estimation from Prototypes for Federated Prompt Tuning), a unified framework designed to achieve both generalization and personalization in federated prompt tuning of ViTs. Within this framework, we introduce the novel Class-Contextualized Mixed Prompt (CCMP) — based on class-specific prompts maintained alongside a globally shared prompt. For each input, CCMP adaptively combines class-specific prompts using weights derived from global class prototypes and client class priors. This approach enables per-sample prompt personalization without storing client-dependent trainable parameters. The prompts are collaboratively optimized via traditional federated averaging technique on the same. Comprehensive evaluations on CIFAR-100, TinyImageNet, DomainNet, and iNaturalist datasets demonstrate that PEP-FedPT consistently surpasses the state-of-the-art baselines under diverse data heterogeneity scenarios, establishing a strong foundation for efficient and generalizable federated prompt tuning of Vision Transformers.

URL: https://openreview.net/forum?id=gO1CpPRj6A

---

Title: Muon Optimizes Under Spectral Norm Constraints

Abstract: The pursuit of faster optimization algorithms remains an active and important research direction in deep learning. Recently, the Muon optimizer has demonstrated promising empirical performance, but its theoretical foundation remains less understood. In this paper, we bridge this gap and provide a theoretical analysis of Muon by placing it within the Lion-$\mathcal{K}$ family of optimizers. Specifically, we show that Muon corresponds to Lion-$\mathcal{K}$ when equipped with the nuclear norm, and we leverage the theoretical results of Lion-$\mathcal{K}$ to establish that Muon (with decoupled weight decay) implicitly solves an optimization problem that enforces a constraint on the spectral norm of weight matrices. This perspective not only demystifies the implicit regularization effects of Muon but also leads to natural generalizations through varying the choice of convex map $\mathcal{K}$, allowing for the exploration of a broader class of implicitly regularized and constrained optimization algorithms.

URL: https://openreview.net/forum?id=Blz4hjxLwU

---

Title: HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

Abstract: An Achilles heel of Large Language Models (LLMs) is their tendency to hallucinate non-factual statements. A response mixed of factual and non-factual statements poses a challenge for humans to verify and accurately base their decisions on. To combat this problem, we propose Highlighted Chain-of-Thought Prompting (HoT), a technique for prompting LLMs to generate responses with XML tags that ground facts to those provided in the question. That is, given an input question, LLMs would first re-format the question to add XML tags
highlighting key facts, and then, generate a response with highlights over the facts referenced from the input. Compared to vanilla chain of thought prompting (CoT), HoT reduces the rate of hallucination and separately improves LLM accuracy of 5 LLMs consistently on over 22 tasks from arithmetic, reading comprehension, to logical reasoning.
Consistent with the success of HoT few-shot prompting, training small LLMs (LLaMA-3.2-1B and Qwen2.5-1.5B) via supervised-finetuning on HoT examples improve LLMs accuracy (on 5 out-of-distribution tasks) over the baselines and over finetuning on CoT examples. When asking humans to verify LLM responses, highlights help time-limited participants to more accurately and efficiently recognize when LLMs are correct. Yet, surprisingly, when LLMs are wrong, HoTs tend to fool users into believing that an answer is correct.

URL: https://openreview.net/forum?id=abm6pDTbT1

---

Reply all

Reply to author

Forward

0 new messages