Weekly TMLR digest for Jun 21, 2026

6 views

Skip to first unread message

TMLR

unread,

Jun 21, 2026, 12:00:14 AMJun 21

to tmlr-annou...@googlegroups.com

New certifications
==================

Survey Certification: A Survey on the Abstraction and Reasoning Corpus

Severin Bratus, David F. Jenny, Andreas Plesner, Roger Wattenhofer

https://openreview.net/forum?id=qzFxBcK9Cg

---

Survey Certification: A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

Ching Chang, Yidan Shi, Defu Cao, Wei Yang, Jeehyun Hwang, Haixin Wang, Jiacheng Pang, Wei Wang, Yan Liu, Wen-Chih Peng, Tien-Fu Chen

https://openreview.net/forum?id=l3QW42g6u3

---

Survey Certification: Advances in Temporal Point Processes: Bayesian, Neural, and LLM Approaches

Feng Zhou, Quyu Kong, Jie Qiao, Cheng Wan, Yixuan Zhang, Ruichu Cai

https://openreview.net/forum?id=SXgGKkShhT

---

J2C Certification: Feedback-Driven Black-Box Safety Alignment Testing of Large Language Models via Reinforcement Learning

Xuan Chen, Yuzhou Nie, Lu Yan, Mingwei Zheng, Yunshu Mao, Wenbo Guo, Xiangyu Zhang

https://openreview.net/forum?id=GWslY31w2b

---

J2C Certification: TropNNC: Structured Neural Network Compression Using Tropical Geometry

Konstantinos Fotopoulos, Petros Maragos, Panagiotis Misiakos

https://openreview.net/forum?id=u7DRq1icmY

---

J2C Certification: Replicability is Asymptotically Free in Multi-armed Bandits

Junpei Komiyama, Shinji Ito, Yuichi Yoshida, Souta Koshino

https://openreview.net/forum?id=E8rmbq8BYP

---

Accepted papers
===============

Title: Deep Generative Spatiotemporal Engression for Probabilistic Forecasting of Epidemics

Authors: Rajdeep Pathak, Tanujit Chakraborty

Abstract: Accurate and reliable forecasting of epidemic incidences is critical for public health preparedness, yet it remains a challenging task due to complex nonlinear temporal dependencies and heterogeneous spatial interactions. Often, point forecasts generated by spatiotemporal models are unreliable in assigning uncertainty to future epidemic events. Probabilistic forecasting of epidemics is therefore crucial for providing the best or worst-case scenarios rather than a simple, often inaccurate, point estimate. We present deep spatiotemporal engression methods to generate accurate and reliable probabilistic forecasts on low-frequency epidemic datasets. The proposed methods act as distributional lenses, and out-of-sample probabilistic forecasts are generated by sampling from the trained models. Our frameworks encapsulate lightweight deep generative architectures, wherein uncertainty is quantified endogenously, driven by a pre-additive noise component during model construction. We establish geometric ergodicity and asymptotic stationarity of the spatiotemporal engression processes under mild assumptions on the network weights and pre-additive noise process. Comprehensive evaluations across six epidemiological datasets over three forecast horizons demonstrate that the proposal consistently outperforms several temporal and spatiotemporal benchmarks in both point and probabilistic forecasting. Additionally, we explore the explainability of the proposal to enhance the models' practical application for informed, timely public health interventions. The 'stengression' Python package offers an end-to-end implementation of our proposed approaches.

URL: https://openreview.net/forum?id=7AfAztCd5A

---

Title: Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles

Authors: Amir Ali Farzin, Yuen-Man Pun, Philipp Braun, Iman Shames

Abstract: This paper explores the performance of a random Gaussian smoothing zeroth-order (ZO) scheme for minimising quasar-convex (QC) and strongly quasar-convex (SQC) functions in both unconstrained and constrained settings. For the unconstrained problem, we establish the ZO algorithm's convergence to a global minimum along with its complexity when applied to both QC and SQC functions. For the constrained problem, we introduce the new notion of proximal-quasar-convexity and prove analogous results to the unconstrained case. Specifically, we derive complexity bounds and prove convergence of the algorithm to a neighbourhood of a global minimum whose size can be controlled under a variance reduction scheme. Beyond the theoretical guarantees, we demonstrate the practical implications of our results on several machine learning problems where quasar-convexity naturally arises, including linear dynamical system identification and generalised linear models.

URL: https://openreview.net/forum?id=rRp9zZBKkZ

---

Title: Disentangling Intrinsic Importance from Emergent Structure in Multi-Expert Orchestration

Authors: Sudipto Ghosh, Sujoy Nath, Sunny Manchanda, Tanmoy Chakraborty

Abstract: Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orchestration policies governing expert interaction and sequencing remain largely opaque. We introduce INFORM, an interpretability analysis that treats orchestration as an explicit, analyzable computation, enabling the decoupling of expert interaction structure, execution order, and functional attribution. We use INFORM to evaluate an orchestrator on GSM8K, HumanEval, and MMLU using a homogeneous consortium of ten instruction-tuned experts drawn from LLaMA-3.1 8B, Qwen3 8B, and DeepSeek-R1 8B, with controlled decoding-temperature variation, and a secondary heterogeneous consortium spanning 1B-7B parameter models. Across tasks, routing dominance is a poor proxy for functional necessity. We reveal a divergence between relational importance, captured by routing mass and interaction topology, and intrinsic importance, measured via gradient sensitivity: frequently selected experts often act as interaction hubs with limited influence, while sparsely routed experts can be structurally critical. Orchestration behaviors emerge asynchronously, with expert centralization preceding stable routing confidence and expert ordering remaining non-deterministic. Targeted ablations show that masking intrinsically important experts induces disproportionate collapse in interaction structure compared to masking frequent peers, confirming that INFORM exposes functional and structural dependencies beyond accuracy metrics alone.

URL: https://openreview.net/forum?id=4W7sgat04A

---

Title: SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models

Authors: Feng Wu, Harsh Deep, Eric Lehman, Sanyam Kapoor, Guoshuai Zhao, Rahul G Krishnan, Gari Clifford, Li-wei H. Lehman

Abstract: Modeling long-sequence medical time series data, such as electrocardiograms (ECG), poses significant challenges due to high sampling rates, multichannel signal complexity, inherent noise, and limited labeled data. While recent self-supervised learning (SSL) methods, based on various encoder architectures such as convolutional neural networks, have been proposed to learn representations from unlabeled data, they often fall short in capturing long-range dependencies and noise-invariant features. Structured state space models (S4) excel at long-sequence modeling, but existing S4 architectures fail to capture the unique characteristics of multichannel physiological waveforms. In this work, we propose SL-S4Wave, a self-supervised learning framework that combines contrastive learning with a tailored encoder built on structured state space models. The encoder incorporates multi-layer global convolution using multiscale subkernels, enabling the capture of both fine-grained local patterns and long-range temporal dependencies in noisy, high-resolution multichannel waveforms. Extensive experiments on real-world datasets demonstrate that SL-S4Wave (1) consistently outperforms state-of-the-art supervised and self-supervised baselines in a challenging arrhythmia detection task, (2) achieves high performance with significantly fewer labeled examples, showcasing strong label efficiency, and (3) maintains robust performance on long waveform segments, highlighting its capacity to model complex temporal dynamics in long sequences that most existing approaches fail to efficiently model, and (4) transfers effectively to unseen arrhythmia types, underscoring its robust cross-domain generalization. We additionally evaluate SL-S4Wave on multiple EEG tasks, achieving superior performance over strong baselines, demonstrating generalizability of our approach beyond cardiac waveforms.

URL: https://openreview.net/forum?id=km0xS3jZeO

---

Title: Jacobian-Aware Posterior Sampling for Inverse Problems

Authors: Liav Hen, Tom Tirer, Raja Giryes, Shady Abu-Hussein

Abstract: Diffusion models provide powerful generative priors for solving inverse problems by sampling from a posterior distribution conditioned on corrupted measurements. Existing methods primarily follow two paradigms: direct methods, which approximate the likelihood term, and proximal methods, which incorporate intermediate solutions satisfying measurement constraints into the sampling process. Under standard Gaussian approximations and locally-linear measurements, we demonstrate that these approaches differ fundamentally in their treatment of the diffusion denoiser's Jacobian within the likelihood term. While this Jacobian encodes critical prior knowledge of the data distribution, training-induced non-idealities can degrade performance in zero-shot settings.
In this work, we bridge direct and proximal approaches by proposing a principled Jacobian-Aware Posterior Sampler (JAPS). JAPS leverages the Jacobian's prior knowledge while mitigating its detrimental effects through a corresponding proximal solution, requiring no additional computational cost. Additionally, we integrate our guidance into DDIM sampling, with a corrected conditional factor that has been missing in previous works. Our method enhances reconstruction quality across diverse linear and nonlinear noisy imaging tasks, outperforming existing diffusion-based baselines in perceptual quality while maintaining or improving distortion metrics.

URL: https://openreview.net/forum?id=m63GJnhIN2

---

Title: Recursive Deep Inverse Reinforcement Learning

Authors: Paul Ghanem, Owen Lewis Howell, Michael Potter, Pau Closas, Alireza Ramezani, Deniz Erdogmus, Tales Imbiriba

Abstract: Inferring an adversary’s goals from exhibited behavior is crucial for counterplanning and non-cooperative multi-agent systems in domains like cybersecurity, military, and strategy games. Deep Inverse Reinforcement Learning (IRL) methods based on maximum entropy principles show promise in recovering adversaries’ goals but are typically offline, require large batch sizes with gradient descent, and rely on first-order updates, limiting their applicability in real-time scenarios. We propose an online Recursive Deep Inverse Reinforcement Learning (RDIRL) approach to recover the cost function governing the adversary actions and goals. Specifically, we minimize an upper bound on the standard Guided Cost Learning (GCL) objective using sequential second-order Newton updates, akin to the Extended Kalman Filter (EKF), leading to a fast (in terms of convergence) learning algorithm. We demonstrate that RDIRL is able to recover cost and reward functions of expert agents in standard and adversarial benchmark tasks. Experiments on benchmark tasks show that our proposed approach outperforms several leading IRL algorithms.

URL: https://openreview.net/forum?id=bYItLUU135

---

Title: LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models

Authors: Shubhang Bhatnagar, Andy Xu, Kar-Han Tan, Narendra Ahuja

Abstract: Large Language Models (LLMs) with multimodal capabilities have revolutionized vision-language tasks, but their deployment often requires huge memory and computational resources. Post-training quantization (PTQ) has successfully compressed language models to as low as 1-bit precision, its effectiveness for multimodal LLMs (MLLMs) remains unexplored. In this paper, we present the first method for ultra-low-bit (<4-bit) quantization of MLLMs. Our analysis reveals that multimodal tokens and intermediate layer activations produced by them exhibit significantly higher entropy compared to text tokens, indicating greater functional complexity that makes MLLMs less tolerant to ultra-low bit quantization. However, this entropy varies significantly across layers, with some layers producing lower-entropy activation distributions that we empirically show can better tolerate ultra-low bit quantization. Existing PTQ methods optimize weight quantization within each layer but apply the same target precision uniformly, ignoring this variation in complexity across layers. Building on this insight, we propose LUQ: Layerwise Ultra-Low Bit Quantization, which characterizes each transformer layer's functional complexity via its output activation entropy and selectively applies ultra-low bit quantization to layers encoding simpler, more compressible functions. We also show that multimodal calibration (image and text tokens) boosts VQA performance in the ultra-low bit regime. Evaluated on LLaVA-1.5 and Qwen-2.5-VL across 9 VQA benchmarks, LUQ models use 40\% and 31\% less memory than their 4-bit counterparts while exhibiting less than 10\% degradation on MME.

URL: https://openreview.net/forum?id=3eK6U6ZiSp

---

Title: Client-Level Defense Placement for Adversarially Robust Federated Reinforcement Learning

Authors: Anish Ambreth, K Naveen Kumar, Mohsen Guizani

Abstract: Federated Reinforcement Learning (FRL) extends federated learning to sequential decision-making, enabling multiple clients to collaboratively train a global policy without sharing raw trajectories. While this setting is promising for privacy-sensitive domains such as autonomous systems and IoT control, it introduces critical attack surfaces: adversaries can corrupt policy gradients, and adaptive attackers that reshuffle targets and prioritize high-impact clients render static defenses brittle. Defenses in FRL operate at two complementary layers: server-side aggregation and client-level placement, but the latter remains under-formalized despite directly shaping attacker incentives.
We propose FRL-CDPS (\textbf{C}lient-Level \textbf{D}efense \textbf{P}lacement for Adversarially Robust \textbf{F}ederated \textbf{R}einforcement \textbf{L}earning: A \textbf{S}tackelberg Approach), which models budget-constrained client-level defense placement as a Stackelberg game: the defender commits to a protection strategy while a rational Bayesian attacker best-responds under imperfect reconnaissance, maintaining posterior beliefs over each client's defense status. The framework captures partial observability and probabilistic defense effectiveness, faithfully reflecting real-world conditions where defenses are imperfect and adversaries operate under uncertainty. Despite NP-hardness of the defender's bilevel problem, we provide tractable solvers, namely exact feasible-set search for small systems and candidate-based Monte Carlo search for larger ones, with a $\frac{1}{2}$-approximation guarantee for the attacker oracle.
Experiments on CartPole-v1, HalfCheetah-v2, and \rev{Walker2d-v5} across seven ablation dimensions show that FRL-CDPS consistently outperforms heuristic client-selection baselines (random, UCB, Thompson sampling) and composes effectively with server-side defenses (FLTG, FedGreed), demonstrating that Stackelberg planning provides a principled and practical advantage for client-level defense in FRL.

URL: https://openreview.net/forum?id=JRpLJhvSCY

---

Title: A Survey on the Abstraction and Reasoning Corpus

Authors: Severin Bratus, David F. Jenny, Andreas Plesner, Roger Wattenhofer

Abstract: Chollet (2019) proposed a definition of intelligence that emphasizes efficiency in skill acquisition rather than performance on a predefined set of tasks, and introduced the Abstraction and Reasoning Corpus (ARC-v1, or ARC-AGI-1) as a challenge benchmark for machine learning research.
In the following years, ARC and the associated competitions have highlighted fundamental limitations of classical deep learning approaches and underscored the need for new ideas in abstract reasoning. This has incentivized extensive trial-and-error exploration, resulting in a wide variety of methods applied to the corpus.
As ARC-v2 was released in March 2025, this literature survey provides a systematic breadth-first overview of the methods applied to ARC-v1 in the six years since its introduction, prior to version 2, and covers early developments for ARC-v2 and ARC Prize 2025.
We apply a taxonomy distinguishing inductive (which explicitly construct transformation rules) and transductive approaches (which directly map inputs to outputs), examine the ecosystem of enabling techniques and auxiliary datasets, and synthesize patterns, trade-offs, and underexplored areas across the research landscape.
Our goal is to provide newcomers with a comprehensive foundation for understanding existing approaches and identifying promising research directions in abstract reasoning.

URL: https://openreview.net/forum?id=qzFxBcK9Cg

---

Title: Imbalanced Semi-Supervised Learning via Label Refinement and Threshold Adjustment

Authors: Zeju Li, Ying-Qiu Zheng, Chen Chen, Saad Jbabdi

Abstract: Semi-supervised learning (SSL) algorithms often struggle to perform well when trained on imbalanced data. In such scenarios, the generated pseudo-labels tend to exhibit a bias toward the majority class, and models relying on these pseudo-labels can further amplify this bias. Existing imbalanced SSL algorithms explore pseudo-labeling strategies based on either pseudo-label refinement (PLR) or threshold adjustment (THA), aiming to mitigate the bias through heuristic-driven designs. However, through a careful statistical analysis, we find that existing strategies are suboptimal: most PLR algorithms are either overly empirical or rely on the unrealistic assumption that models remain well-calibrated throughout training, while most THA algorithms depend on flawed metrics for pseudo-label selection. To address these shortcomings, we first derive the theoretically optimal form of pseudo-labels under class imbalance. This foundation leads to our key contribution: SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL), a unified framework that learns both PLR and THA parameters from a class-balanced subset of training data. By jointly optimizing these components, SEVAL adapts to specific task requirements while ensuring per-class pseudo-label reliability. Our experiments demonstrate that SEVAL outperforms state-of-the-art SSL methods, producing more accurate and effective pseudo-labels across various imbalanced SSL scenarios while remaining compatible with diverse SSL algorithms. The code is publicly available~\footnote{\url{https://github.com/ZerojumpLine/SEVAL}}.

URL: https://openreview.net/forum?id=HbAMQiyK48

---

Title: Do Object Channels Improve Robustness in Deep Reinforcement Learning?

Authors: Jannis Blüml, Cedric Derstroff, Bjarne Gregori, Elisabeth Dillies, Quentin Delfosse, Kristian Kersting

Abstract: Pixel-based reinforcement learning agents often exploit spurious visual correlations, leading to brittle policies that fail under minor visual perturbations. We systematically investigate spatial grounded semantic channel representations, often called Feature Maps, Planes, or Object Channels, as a representation design principle for reducing shortcut learning.
Object channels map detected entities into binary tensors aligned with the original coordinate frame, preserving compatibility with standard RL backbones without architectural modifications.
Specifically, through systematic evaluation in Atari environments under controlled perturbations, we demonstrate that such channel representations substantially improve zero-shot robustness to distribution shifts while maintaining competitive in-distribution performance.
We analyze the abstraction–fidelity trade-off and show that combining object channels with raw pixels improves robustness and sample efficiency compared to pure pixel-based approaches. The experimental results indicate that spatially grounded object-based encodings offer a practical mechanism for bridging pixel- and object-centric RL.

URL: https://openreview.net/forum?id=7BFbso4B3R

---

Title: CMOOD: Concept-based Multi-label OOD Detection

Authors: Zhendong Liu, Yi Nian, Yuehan Qin, Henry Peng Zou, Li Li, Xiyang Hu

Abstract: How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large language models have revolutionized zero-shot OOD detection, they primarily focus on single-label scenarios, leaving a critical gap in handling real-world tasks where samples can be associated with multiple interdependent labels. To address these challenges, we introduce CMOOD, a novel zero-shot multi-label OOD detection framework. CMOOD leverages pre-trained vision-language models, enhancing them with a concept-based label expansion strategy and a new scoring function. By enriching the semantic space with both positive and negative concepts for each label, our approach models complex label dependencies, precisely differentiating OOD samples without the need for additional training. Extensive experiments demonstrate that our method significantly outperforms existing approaches, achieving approximately 95% average AUROC on both VOC and COCO datasets, while maintaining robust performance across varying numbers of labels and different types of OOD samples. We release our code at https://github.com/boosLiu/COOD.

URL: https://openreview.net/forum?id=EmoFJ8tcko

---

Title: Solving Constrained Optimization Problems as ODE-based Models Using Reinforcement Learning

Authors: Han Meng, Xinsong Feng, Yang Li, Chenan Wang, Kishansingh Rajput, Malachi Schram, Haipeng Chen

Abstract: Previous learning-to-optimize (L2O) methods on constrained optimization problems often treat neural networks as initializers that generate approximate solutions requiring substantial post-hoc refinements. This approach overlooks a key insight: Solving complex optimization problems often requires iterative refinement of candidate solutions, a process naturally aligned with the Markov Decision Process (MDP) and reinforcement learning (RL) framework. We show that within the MDP framework, RL and Ordinary Differential Equation (ODE)-based generative models (e.g., diffusion, flow matching) are formally equivalent, unifying them as trainable optimizers. Building on our unified perspective, we propose to train a flow-matching model within an RL paradigm as a learnable refinement mechanism, thereby incorporating constraint satisfaction directly into the optimization process. To further enhance feasibility, we introduce a minimal correction step that adjusts solutions to ensure constraint compliance. Empirical results demonstrate that our approach achieves state-of-the-art performance across a range of constrained optimization tasks, yielding improvements in efficiency, solution quality, and feasibility over prior baselines.

URL: https://openreview.net/forum?id=QW0ZX4zRC2

---

Title: CUDA: Capturing Uncertainty and Diversity in Preference Feedback Augmentation

Authors: Sehyeok Kang, Jaewook Jeong, Se-Young Yun

Abstract: Preference-based Reinforcement Learning (PbRL) effectively addresses reward design challenges in Reinforcement Learning and facilitates human-AI alignment by enabling agents to learn human intentions. However, optimizing PbRL critically depends on abundant, diverse, and accurate human feedback, which is costly and time-consuming to acquire. Existing feedback augmentation methods aim to alleviate the scarcity of human preference feedback. However, they often neglect diversity, primarily generating feedback for high-confidence trajectory pairs with extreme differences. This approach leads to a biased augmented set that incompletely represents human preferences. To overcome this, we introduce Capturing Uncertainty and Diversity in preference feedback Augmentation (CUDA), a novel approach that comprehensively considers both uncertainty and diversity. CUDA enhances augmentation by employing ensemble-based uncertainty estimation for filtering and extracting feedback from diverse clusters via bucket-based categorization. These two mechanisms enable CUDA to obtain diverse and accurate augmented feedback. We evaluate CUDA on MetaWorld and DMControl offline datasets, demonstrating significant performance improvements over various offline PbRL algorithms and existing augmentation methods across diverse scenarios.

URL: https://openreview.net/forum?id=KWENSE1tC4

---

Title: Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models using Reinforcement Learning from Ranked Feedback

Authors: Derek Shi, Ruben Glatt, Christine Klymko, Hongjun Choi, Shashank Kushwaha, Wesam A. Sakla, Felipe Leno da Silva

Abstract: Recent advances in large video-language models (VLMs) rely on extensive fine-tuning techniques that strengthen alignment between textual and visual comprehension. Many implementations typically begin with supervised fine-tuning (SFT) followed by reinforcement learning from preference data to enhance video comprehension. However, as VLMs scale in parameter size, so does the cost of gathering enough human feedback. To make fine-tuning more cost-effective, recent frameworks have explored reinforcement learning with AI feedback (RLAIF), which replace human preference with AI as a judge. Current RLAIF frameworks rely on a specialized reward model trained with video narratives to create calibrated scalar rewards-- an expensive and restrictive pipeline. We propose Oracle-RLAIF, a novel framework that replaces the trained reward model with a more general Oracle ranker which acts as a drop-in model ranking candidate model responses rather than scoring them. Alongside Oracle-RLAIF, we introduce $GRPO_{rank}$, a modified Group Relative Policy Optimization (GRPO) loss function that directly optimizes feedback with rank-aware advantages. Empirically, we demonstrate that Oracle-RLAIF consistently outperforms leading VLM fine-tuning methods when evaluated across various video comprehension benchmarks. Oracle-RLAIF paves the path to creating flexible and data-efficient frameworks for aligning large multi-modal video models with reinforcement learning from rank rather than score based reward models.

URL: https://openreview.net/forum?id=RIRgnRicTa

---

Title: Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning

Authors: Manav Vora, Jonas Liang, Michael N Grussing, Melkior Ornik

Abstract: Many real-world decision problems, ranging from asset-maintenance scheduling to portfolio rebalancing, can be naturally modelled as budget-constrained multi-component monotonic Partially Observable Markov Decision Processes (POMDPs): each component’s latent state degrades stochastically until an expensive restorative action is taken, while all assets share a fixed intervention budget.
For a large numbers of assets, deriving an optimal policy for this joint POMDP is computationally intractable. To tackle this challenge, we prove that the value function of the associated belief-MDP is \emph{budget-concave}, which allows an efficient two-step approach to finding a near-optimal policy. First, we approximate the optimal cross-component budget split via a random-forest surrogate of each single-component value function. Second, we solve each resulting budget-constrained single-component POMDP with an oracle-guided meta-trained Proximal Policy Optimization (PPO) policy: value-iteration on the fully observable counterpart yields an oracle that shapes the PPO update and greatly accelerates learning. We validate our method through experiments in two disparate domains: (i) preventive maintenance for a large-scale building infrastructure containing 1,000 components, and (ii) portfolio risk management under debit-only loss-budget constraints, where each asset’s latent budget depletes with market losses and can only be replenished through costly recapitalization. Results show that our method consistently achieves longer component survival times and enhanced portfolio viability than both baseline heuristics and vanilla PPO. Furthermore, our approach maintains linear scalability in solution time with respect to the number of components.

Code: https://github.com/leadcatlab/Oracle-Guided-Meta-PPO

URL: https://openreview.net/forum?id=yEAnjlmliL

---

Title: Multi-Constraint Online Convex Optimization with Adversarial Constraints

Authors: Wentao Zhang

Abstract: We study online convex optimization with multiple adversarial constraints, where at each round a learner selects an action, and an adversary simultaneously reveals a convex cost function and $K$ convex constraint functions. The learner aims to minimize regret while keeping the cumulative constraint violation (CCV) of each individual constraint small. We introduce the Multi-Constraint Constrained Online Convex Optimization (MC-COCO) framework and develop a unified algorithmic approach based on exponential Lyapunov potentials. The key insight is that encoding all $K$ constraint violations via the potential $S_t = \sum_{k=1}^{K} e^{\lambda Q_k(t)}$ yields a surrogate cost whose growth ratio is controlled by the maximum single-round violation rather than the number of constraints $K$. This decoupling enables a per-constraint CCV of $\widetilde{O}(T^{1-\beta} \ln K)$, where $\beta \in [0,1]$ is a tunable regret-CCV trade-off parameter, improving qualitatively over the linear $K$-dependence of naive approaches. We instantiate the framework across three canonical settings (constrained experts, general Lipschitz-convex, and smooth convex) and further develop extensions for heterogeneous constraint prioritization (where critical constraints can be controlled at the $\widetilde{O}(T^{1-\beta}/\alpha_k)$ level) and long-term budget feasibility. Experiments on adversarial instances with up to $K=100$ constraints validate the theoretical bounds and confirm the logarithmic scaling in $K$.

URL: https://openreview.net/forum?id=3sLjLHCGzS

---

Title: MOONSHOT: A Framework for Multi-Objective Pruning of Vision and Large Language Models

Authors: Gabriel Afriat, Xiang Meng, Shibal Ibrahim, Hussein Hazimeh, Rahul Mazumder

Abstract: Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods typically optimize a single objective, such as a layer-wise reconstruction loss or a second-order Taylor approximation of the training loss. We highlight that neither objective alone is consistently the most effective across architectures and sparsity levels. Motivated by this insight, we propose MOONSHOT, a general and flexible framework that extends any single-objective pruning method into a multi-objective formulation by jointly optimizing both the layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT acts as a wrapper around existing pruning algorithms. To enable this integration while maintaining scalability to billion-parameter models, we propose modeling decisions and introduce an efficient procedure for computing the inverse Hessian, preserving the efficiency of state-of-the-art one-shot pruners. When combined with state-of-the-art pruning methods on Llama-3.2 and Llama-2 models, MOONSHOT reduces C4 perplexity by up to 32.6% at 2:4 sparsity and improves zero-shot mean accuracy across seven classification benchmarks by up to 4.9 points. On Vision Transformers, it improves accuracy on ImageNet-1k by over 5 points at 70% sparsity, and on ResNet-50, it yields a 4-point gain at 90% sparsity. Our code is available at https://github.com/mazumder-lab/MOONSHOT.

URL: https://openreview.net/forum?id=Ew9s7veEQU

---

Title: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning

Authors: Taiwei Shi, Yiyang Wu, Linxin Song, Tianyi Zhou, Jieyu Zhao

Abstract: Reinforcement finetuning (RFT) has shown great potential for enhancing the mathematical reasoning capabilities of large language models (LLMs), but it is often sample- and compute-inefficient, requiring extensive training. In this work, we introduce AdaRFT (Adaptive Curriculum Reinforcement Finetuning), a method that significantly improves the efficiency of RFT through adaptive curriculum learning. AdaRFT dynamically adjusts the difficulty of training problems based on the model’s recent reward signals, ensuring that the model consistently trains on tasks that are challenging but solvable. This adaptive sampling strategy accelerates learning by maintaining an optimal difficulty range, avoiding wasted computation on problems that are too easy or too hard. AdaRFT requires only a lightweight extension to standard RFT algorithms like Proximal Policy Optimization (PPO), without modifying the reward function or model architecture. Experiments on competition-level math datasets demonstrate that AdaRFT improves convergence efficiency and reasoning performance. Given problem-level difficulty annotations, AdaRFT reduces RFT training time by up to 2 times across data distributions and model scales, offering a more scalable and effective RFT framework.

URL: https://openreview.net/forum?id=UEhpyq41b9

---

Title: A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

Authors: Ching Chang, Yidan Shi, Defu Cao, Wei Yang, Jeehyun Hwang, Haixin Wang, Jiacheng Pang, Wei Wang, Yan Liu, Wen-Chih Peng, Tien-Fu Chen

Abstract: Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer.
This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning that explores, revises, and aggregates.
The topology is crossed with the main objectives of the field, including traditional time series analysis, explanation and understanding, causal inference and decision making, and time series generation, while a compact tag set spans these axes and captures decomposition and verification, ensembling, tool use, knowledge access, multimodality, agent loops, and LLM adaptation regimes.
Methods and systems are reviewed across domains, showing what each topology enables and where it breaks down in faithfulness or robustness, along with curated datasets, benchmarks, and resources that support study and deployment (with an accompanying repository at \url{https://github.com/blacksnail789521/Time-Series-Reasoning-Survey}).
Evaluation practices that keep evidence visible and temporally aligned are highlighted, and guidance is distilled on matching topology to uncertainty, grounding with observable artifacts, planning for shift and streaming, and treating cost and latency as design budgets.
We emphasize that reasoning structures must balance capacity for grounding and self-correction against computational cost and reproducibility, while future progress will likely depend on benchmarks that tie reasoning quality to utility and on closed-loop testbeds that trade off cost and risk under shift-aware, streaming, and long-horizon settings.
Taken together, these directions mark a shift from narrow accuracy toward reliability at scale, enabling systems that not only analyze but also understand, explain, and act on dynamic worlds with traceable evidence and credible outcomes.

URL: https://openreview.net/forum?id=l3QW42g6u3

---

Title: Feedback-Driven Vision-Language Alignment via Sampling-based Visual Projection

Authors: Giorgio Giannone, Yev V Perevodchikov, Qianli Feng, Ruoteng Li, Rui Chen, Aleix M Martinez

Abstract: Vision-language models (VLMs) combine image understanding and language generation, yet they frequently produce descriptions inconsistent with the visual input, leading to hallucinated objects and reduced reliability. We propose Sampling-based Visual Projection (SVP), a training framework that improves vision-language alignment by using a pretrained grounding model as feedback during data generation rather than as supervision. For each unlabeled seed image, the base model generates draft descriptions, a grounding model evaluates those drafts to return spatial feedback, and the VLM generates refined descriptions conditioned on that feedback. SVP selects the most informative candidates and fine-tunes the base model using only these natural-language outputs, effectively distilling spatial reasoning capabilities without injecting explicit grounding tokens or coordinates. Across ten benchmarks, SVP yields broad performance gains, including a 14% average improvement in captioning and a 12% increase in object recall, significantly reducing hallucinations while preserving robust question-answering capabilities.

URL: https://openreview.net/forum?id=vt0bX7QmyX

---

Title: Advances in Temporal Point Processes: Bayesian, Neural, and LLM Approaches

Authors: Feng Zhou, Quyu Kong, Jie Qiao, Cheng Wan, Yixuan Zhang, Ruichu Cai

Abstract: Temporal point processes (TPPs) are stochastic process models used to characterize event sequences occurring in continuous time. Traditional statistical TPPs have a long-standing history, with numerous models proposed and successfully applied across diverse domains. In recent years, advances in deep learning have spurred the development of neural TPPs, enabling greater flexibility and expressiveness in capturing complex temporal dynamics. The emergence of large language models (LLMs) has further sparked excitement, offering new possibilities for modeling and analyzing event sequences by leveraging their rich contextual understanding.
This survey presents a comprehensive review of recent research on TPPs from three perspectives: Bayesian, deep learning, and LLM approaches. We begin with a review of the fundamental concepts of TPPs, followed by an in-depth discussion of model design and parameter estimation techniques in these three frameworks. We also revisit classic application areas of TPPs to highlight their practical relevance. Finally, we outline challenges and promising directions for future research.

URL: https://openreview.net/forum?id=SXgGKkShhT

---

Title: When Test-Time Training Fails: A Critical Analysis of Robustness and Hyperparameter Sensitivity

Authors: Ziqi Wang, Xiusi Chen, Gaotang Li, Heng Ji, Tong Zhang

Abstract: Test-time training (TTT) through input perplexity minimization has emerged as a promising approach for enhancing language model performance during inference. However, questions remain about its practical robustness and applicability beyond popular benchmarks. This paper presents a preliminary analysis investigating two critical questions: whether TTT is effective on unseen tasks and how sensitive it is to hyperparameter choices. We evaluate TTT on three anti-memorization datasets—Memo-Trap, GSM-Symbolic, and Math-Perturb—using six models from the Qwen 2.5 and Llama 3 families. Our findings reveal that while TTT shows effectiveness on common benchmarks such as AIME 2024, it struggles with tasks designed to counter memorization, raising questions about whether the gains stem from domain adaptation or data contamination. We identify significant performance differences among optimizers, with SGD outperforming Adam despite slower convergence. Through extensive hyperparameter sweeps over learning rates, training steps, weight decay, momentum, and gradient normalization, we demonstrate that TTT is highly sensitive to these choices, with no universal recipe across tasks and models. Notably, gradient normalization emerges as an effective technique for improving robustness by mitigating catastrophic performance drops and reducing sensitivity to the learning rate. Our analysis also reveals that tuning feed-forward networks can achieve better peak performance than full model tuning, while attention-only tuning provides more stable worst-case performance. These findings highlight the need for continued research into making test-time training more practical and reliable for real-world deployment. Since this research only focuses on a specific algorithm of TTT: input perplexity minimization, our conclusion may not be applied to all TTT algorithms. We call on the community to pay closer attention to TTT's sensitivity to make it better suited for real-world applications

URL: https://openreview.net/forum?id=0Eh31N1Hoj

---

Title: United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory

Authors: HaoYang Shang, Xuan Liu, Zi Liang, Jie ZHANG, Haibo Hu, Song Guo

Abstract: Large Language Models (LLMs) exhibit a notable performance ceiling on complex, multi-faceted tasks. As practitioners increasingly rely on heavy context engineering -- curating intricate instructions, tool schemas, and multi-turn histories -- the processing demands often exceed the LLM's effective attention budget, leading to context rot. Drawing an analogy to Cognitive Load Theory (CLT) in cognitive science, we propose that this bottleneck is functionally analogous to the bounded working memory of the human mind. Rather than relying on heuristic prompt engineering, we use CLT as a principled design lens for LLM system design. To operationalize this insight, we introduce CoThinker, an instantiation of a CLT-driven multi-agent framework. CoThinker operationalizes CLT principles by distributing intrinsic cognitive load through agent specialization and managing transactional load via structured communication and a collective working memory. We empirically evaluate CoThinker on complex problem-solving tasks and fabricated high cognitive load scenarios. Our results are consistent with a CLT-informed account of multi-agent coordination: gains concentrate on reasoning-heavy tasks where cognitive load is high, while coordination overhead dominates on low-intrinsic-load tasks such as instruction-following -- a boundary predicted by the cognitive-load-profile view. Our analysis reveals characteristic interaction patterns that cast insights from collective cognition and load management into a principled approach to agent system design.

URL: https://openreview.net/forum?id=8BYJHZiZ5T

---

Title: When Lifelong Novelty Fails: Coordination Breakdown in Decentralised MARL

Authors: Ting Zhu, Yue Jin, Giovanni Montana

Abstract: Lifelong novelty bonuses are a cornerstone of exploration in reinforcement learning, but we identify a critical failure mode when they are applied to decentralised multi-agent coordination tasks: \emph{coordination de-synchronisation}. In sequential coordination tasks with multiple joint coordination checkpoints (states that all agents must occupy simultaneously), agents searching for later checkpoints must repeatedly traverse earlier ones. Under lifelong novelty, this repeated traversal gradually depletes intrinsic motivation to revisit these critical locations and can destabilise coordination. Within a stylised analytical framework, we derive lower bounds showing that the \emph{guaranteed} success probability under a lifelong novelty scheme can shrink polynomially with a problem-dependent geometric \emph{revisit pressure} and the number of agents, whereas episodic bonuses, which reset at the start of each episode, provide a time-uniform lower bound on the probability of reaching a given checkpoint. We further prove that a hybrid scheme, which multiplicatively combines episodic and lifelong bonuses, inherits both a constant ``coordination floor'' at known checkpoints and a persistent drive to discover previously unseen states. We validate the qualitative predictions of this framework in GridWorld, Overcooked, and StarCraft~II, where hybrid bonuses yield substantially more reliable coordination than lifelong-only exploration in environments with multiple sequential checkpoints or narrow geometric bottlenecks, such as corridors that force agents to pass through the same cells many times. Together, these results provide a theoretical and empirical account of when different intrinsic motivation schemes are effective in decentralised multi-agent coordination.

URL: https://openreview.net/forum?id=xOPjPFTuvy

---

Title: Adaptive Rank Control for Robust Reinforcement Learning

Authors: Chenliang Li, Junyu Leng, Jiaxiang Li, Youbang Sun, Shixiang Chen, Shahin Shahrampour, Alfredo Garcia

Abstract: Robust reinforcement learning (RL) is commonly formulated as a min--max optimization problem to account for epistemic uncertainty in transition dynamics.
While theoretically appealing, such formulations are computationally demanding and often induce overly conservative policies.
We study an alternative approach in which transition dynamics are sampled from an uncertainty set and robustness is achieved through explicit control of policy complexity.
In the neural tangent kernel regime, we show that training with uniformly sampled dynamics induces a bias--variance tradeoff, with lower-rank policy representations exhibiting reduced sensitivity to epistemic perturbations.
Within the framework of entropy-regularized RL, we formulate robust learning as a bi-level optimization problem that balances expressiveness and robustness via adaptive low-rank policy representations, leading to an adaptive rank-selection mechanism that navigates this tradeoff during training.
We establish policy convergence and demonstrate empirically on MuJoCo continuous-control benchmarks that the proposed method provides a scalable and computationally efficient alternative to traditional robust RL, achieving improved robustness without the overhead of adversarial inner-loop optimization.

URL: https://openreview.net/forum?id=lG7VizZQcS

---

Title: Supervised Quadratic Feature Analysis: Information Geometry for Dimensionality Reduction

Authors: Daniel Herrera-Esposito, Johannes Burge

Abstract: Supervised dimensionality reduction maps labeled data into a low-dimensional feature space while preserving class separation. A common strategy is to learn features that maximize a measure of statistical dissimilarity between the class-conditional probability distributions. Information geometry, which is rooted in Riemannian geometry, provides an alternative framework for measuring class dissimilarity. It treats probability distributions as points in a statistical manifold and uses the Fisher information metric to define a geodesic distance--the Fisher-Rao distance--between distributions The Fisher-Rao distance is an appealing candidate for measuring class separation because the Fisher information metric is a local measure of discriminability, and because it allows a geometric interpretation. Here, we present Supervised Quadratic Feature Analysis (SQFA), a supervised dimensionality reduction method which learns linear features that maximize Fisher-Rao distances between class-conditional distributions, under Gaussian assumptions. In multiple real world datasets, we find that SQFA features support classification accuracy that is competitive with features that maximize more popular measures of dissimilarity, or that are learned by other state-of-the-art dimensionality reduction methods. Notably, the best classification accuracy is achieved by SQFA-H features, a variant of SQFA that maximizes the Hellinger distance, a rarely used objective for dimensionality reduction. These results demonstrate the potential of information geometry as a tool for supervised dimensionality reduction. We provide a Python implementation of SQFA at \url{https://github.com/dherrera1911/sqfa}.

URL: https://openreview.net/forum?id=jwNJiLphnZ

---

Title: Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

Authors: Gio Huh, Jaeha Lee, Ning Su, Tony Yue YU

Abstract: Recent efforts have extended the capabilities of transformers in logical reasoning and symbolic computations. In this work, we investigate their capacity for functional decomposition, focusing on the challenging algebraic task of multivariate polynomial decomposition. This problem, with widespread applications in science and engineering, is proved to be NP-hard, and demands both precision and insight. Our contributions are threefold: First, we develop a synthetic data generation pipeline providing fine-grained control over problem complexity. Second, we train transformer models via supervised learning and evaluate them across four key dimensions involving scaling behavior and generalizability. Third, we propose Beam Grouped Relative Policy Optimization (BGRPO), a rank-aware reinforcement learning method suitable for hard algebraic problems. Fine-tuning with rank-aware BGRPO improves beam-search accuracy by 33--37 percentage points over the SFT initialization and by 1.7--3.7 points over vanilla GRPO, with non-overlapping $\pm 1\sigma$ bands at every scale. After RL, even greedy decoding surpasses the SFT model's best beam-search score by 23.8--25.8 percentage points at every scale. Additionally, our model demonstrates competitive performance with Mathematica's FullSimplify on leaf count in various cases.

URL: https://openreview.net/forum?id=Vxf8QDIA6Z

---

Title: CodecSep: Prompt-Driven Universal Sound Separation on Neural Audio Codec Latents

Authors: Adhiraj Banerjee, Vipul Arora

Abstract: Text-guided sound separation enables flexible audio editing and assistive applications, but existing open-domain systems such as AudioSep remain too compute-intensive for low-latency edge or codec-mediated deployment. Neural audio codec (NAC)-based separators such as CodecFormer and SDCodec are more efficient, but they are largely restricted to fixed-class or fixed-stem separation.

We introduce \textbf{CodecSep}, a \emph{text-guided universal sound separation} framework that operates directly in neural audio codec latent space. CodecSep combines a frozen DAC backbone with a lightweight Transformer \emph{masker} conditioned by CLAP-derived FiLM parameters, enabling open-vocabulary source extraction while preserving the efficiency advantages of codec-native representations. To our knowledge, this is the first prompt-driven universal sound separation system built directly on NAC latents.

Across \textbf{dnr-v2} and five additional open-domain benchmarks under matched training and prompting protocols, CodecSep consistently improves over AudioSep in separation fidelity (\textbf{SI\mbox{-}SDR}) while remaining competitive in perceptual quality (\textbf{ViSQOL}), and also shows gains in human \textbf{MOS--LQS}. Further analyses show that finer-grained semantic supervision improves separation more consistently than coarse prompting, and that \emph{explicit masking} is more effective than decoder-style latent generation for codec-domain source separation. Qualitative and diagnostic analyses further support the central design premise: modern NAC latents preserve meaningful \emph{source-dependent structure}, and the learned masks exploit this structure primarily through \emph{channel-wise modulation}, indicating that source extraction can be performed through masking alone without explicit latent generation.

From a systems perspective, CodecSep also provides a concrete \emph{deployment path} for codec-mediated audio processing. In deployment-typical \emph{code-stream} settings, where the edge device transmits audio as NAC codes generated by the same codec backbone used by the separator, the server can map the received codes to codec embeddings through codebook lookup and perform separation directly in codec space, avoiding a separate decode--separate--re-encode cycle. In this regime, CodecSep requires only \textbf{1.35~GMACs} end-to-end—about $\mathbf{54\times}$ less compute than AudioSep in the same codec-mediated pipeline (and about $\mathbf{25\times}$ lower separator-only compute)—while also reducing latency and memory footprint substantially and remaining fully compatible with \emph{codes in: codes out} operation. More broadly, this codes-in / codes-out formulation provides a concrete blueprint for \emph{codec-native downstream audio processing}, suggesting that tasks such as enhancement, denoising, dereverberation, and prompt-guided audio editing can be designed to operate directly on NAC representations rather than repeatedly decoding to waveform and re-encoding after each processing stage.

URL: https://openreview.net/forum?id=r63GX9hKhC

---

Title: Hierarchical Multi-Level 3D Geometry Generation with Stress-Aware Learning

Authors: Vadim Zlobin, Vladislav Puzach, Olga Bidlevich, Mikhail Chetnev, Vitaly Gromov

Abstract: Current approaches for LEGO-style 3D structural assembly are usually learned to maximize intersection over union between generated output and target construction. We propose a new approach which is able to build stable structures based on physics-aware reward. Our method employs a two-level agent architecture in which a high-level proximal policy optimization based planner proposes a scheme, while a low-level wave function collapse agent handles precise brick placement with constraint satisfaction. Experimental results demonstrate that our hierarchical method consistently constructs buildings that satisfy stress constraints while reducing material usage. We also show that replacing the finite element method solver with a Fourier neural operator achieves comparable performance, providing proof-of-concept that the proposed approach can work with neural surrogates. Our code is available at https://github.com/iSegments-Lab/stress_aware_bricks_model.

URL: https://openreview.net/forum?id=kyoXKiyoA3

---

Title: GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness

Authors: Kung-Hsiang Huang, Haoyi Qiu, Yutong Dai, Caiming Xiong, Chien-Sheng Wu

Abstract: Graphical user interface (GUI) agents face severe efficiency bottlenecks when processing long sequences of high-resolution screenshots, making inference costly and memory-bound. Existing KV cache compression methods, designed for natural images, remain suboptimal as they fail to exploit the unique spatial and temporal redundancies of GUIs. In this work, we first demonstrate that unlike natural images, GUI attention sparsity is uniformly high (>0.99) across all transformer layers, invalidating complex layer-varying budget strategies. Building on this insight, we introduce GUI-KV, a training-free compression method that allocates a uniform budget driven by two novel mechanisms: (1) spatial saliency guidance, which augments attention with residual stream L2 norms to preserve semantic visual tokens; and (2) temporal redundancy scoring, which employs subspace projection to identify and prune historical frames that are linearly redundant with the current view. Across six benchmarks, GUI-KV outperforms competitive baselines, often recovering near-full-cache accuracy at 10-20% budgets. Notably, on AgentNetBench, it reduces decoding FLOPs by 38.9% while increasing step accuracy by 4.1% over the full-cache baseline.

URL: https://openreview.net/forum?id=qaJECugPzr

---

Title: VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Authors: Wentao Ma, Weiming Ren, Yiming Jia, Zhuofeng Li, Ping Nie, Ge Zhang, Wenhu Chen

Abstract: Large multimodal models (LMMs) have recently emerged as a powerful tool for long video understanding (LVU), prompting the development of standardized LVU benchmarks to evaluate their performance. However, our investigation reveals a rather sober lesson for existing LVU benchmarks. First, most existing benchmarks rely heavily on multiple-choice questions (MCQs), whose evaluation results are inflated due to the possibility of guessing the correct answer; Second, a significant portion of questions in these benchmarks have strong priors to allow models to answer directly without even reading the input video. For example, Gemini-1.5-Pro can achieve over 50% accuracy given a random frame from a long video on Video-MME. We also observe that increasing the number of frames does not necessarily lead to improvement on existing benchmarks, which is counterintuitive. As a result, the validity and robustness of current LVU benchmarks are undermined, impeding a faithful assessment of LMMs’ long‑video understanding capability. To tackle this problem, we propose VideoEval-Pro, a realistic LVU benchmark containing questions with open‑ended short‑answer, which truly require understanding the entire video. VideoEval-Pro assesses both segment‑level and full‑video understanding through perception and reasoning tasks. By evaluating 27 proprietary and open-source video LMMs, we conclude the following findings: (1) video LMMs show drastic performance (>25%) drops on open-ended questions compared with MCQs; (2) surprisingly, higher MCQ scores do not lead to higher open-ended scores on VideoEval-Pro; (3) compared to other MCQ benchmarks, VideoEval-Pro benefits more from increasing the number of input frames. Our results show that VideoEval-Pro offers a more realistic and reliable measure of long video understanding, providing a clearer view of progress in this domain.

URL: https://openreview.net/forum?id=2BCfis3jZA

---

Title: ViscoReg: Neural Signed Distance Functions via Viscosity Solutions

Authors: Meenakshi Krishnan, Ramani Duraiswami

Abstract: Implicit Neural Representations (INRs) that learn Signed Distance Functions (SDFs) from point cloud data represent the state-of-the-art for geometrically accurate 3D scene reconstruction. However, training these Neural SDFs involves enforcing the Eikonal equation, an ill-posed equation that also leads to unstable gradient flows. Numerical Eikonal solvers have relied on viscosity approaches for regularization and stability. Motivated by this well-established theory, we introduce ViscoReg, a novel regularizer for Neural SDF methods, and theoretically prove that it stabilizes training. Empirically, ViscoReg outperforms state-of-the-art approaches such as SIREN, DiGS, StEik, and HotSpot across most metrics on ShapeNet, the Surface Reconstruction Benchmark, 3D scene reconstruction and reconstruction from real scans. We also establish novel generalization error estimates for Neural SDFs in terms of the training error, using the theory of viscosity solutions. Our empirical and theoretical results provide confidence in the general applicability of our method.

URL: https://openreview.net/forum?id=DWnMkBU4sF

---

Title: Broadcast Product: Redefining Shape-aligned Element-wise Multiplication and Beyond

Authors: Yusuke Matsui, Tatsuya Yokota

Abstract: Broadcast operations are widely used in scientific computing libraries, yet their mathematical formulation is often implicit and inconsistently represented in machine learning literature. This problem frequently leads to invalid equations when element-wise products are written despite mismatched tensor shapes. In this paper, we formalize such operations by introducing the broadcast product $\boxdot$, which explicitly extends the Hadamard product through shape-aligned element duplication. We provide a rigorous definition of the broadcast product, analyze its algebraic properties, and show how it can be expressed using standard linear algebra. Building on this framework, we formulate least-squares problems and sketch a proof-of-concept broadcast decomposition. As a preliminary illustration, we show that the formalism enables a new family of decompositions with distinct structural properties from conventional tensor decompositions. This work establishes a mathematical foundation for broadcast-aware tensor operations, connecting practical implementations with rigorous tensor analysis.

URL: https://openreview.net/forum?id=zv0OtOPpPO

---

Title: From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients

Authors: Nicolas Michel, Maorong Wang, Jiangpeng He, Toshihiko Yamasaki

Abstract: Continual Learning (CL) aims to learn from a non-stationary data stream where the underlying distribution changes over time. While recent advances have produced efficient memory-free methods in the offline CL (offCL) setting, online CL (onCL) remains dominated by memory-based approaches. The transition from offCL to onCL is challenging, as many offline methods rely on (1) prior knowledge of task boundaries and (2) sophisticated scheduling or optimization schemes, both of which are unavailable when data arrives sequentially and can be seen only once. In this paper, we investigate the adaptation of state-of-the-art memory-free offCL methods to the online setting. We first show that augmenting these methods with lightweight prototypes significantly improves performance, albeit at the cost of increased Gradient Imbalance, resulting in a biased learning towards earlier tasks. To address this issue, we formulate Fine-Grained Hypergradients as an online mechanism for rebalancing gradient updates during training. Our experiments demonstrate that the synergy between prototype memory and hypergradient reweighting substantially allows for improved performance of memory-free methods in onCL. The code and implementation for this work are publicly available at: \url{https://github.com/Nicolas1203/fgh}.

URL: https://openreview.net/forum?id=Tu9rHiczLm

---

Title: Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics

Authors: Ihor Kendiukhov

Abstract: Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we study a canonical regime with 512 highly variable genes and 200,000 cells, alongside an exploratory comparison regime with 1,024 genes and 10,000 cells. Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law L = aP^(-alpha) + c to validation mean squared error (MSE). The canonical V = 512, D = 200k regime exhibits clear power-law scaling on validation loss, and held-out test evaluation follows the same qualitative trend. By contrast, the original V = 1,024, D = 10k comparison does not provide a clean causal test of data scarcity because vocabulary size, dataset size, and training budgets all differ simultaneously; we therefore treat it as exploratory rather than definitive. We additionally report matched-V follow-up analyses, including a fixed-V = 512 data-size sweep, held-out test-set scaling, and an empirical check of cross-gene residual heterogeneity. Under a homoscedastic Gaussian approximation, the asymptotic floor in the canonical regime corresponds to approximately 2.3 bits of irreducible uncertainty remaining per masked gene position, not to a universal biological constant. We discuss implications for the design of single-cell foundation models and outline the additional matched sweeps and likelihood-based objectives needed to turn this preliminary quantity into a rigorous transcriptomic entropy estimate.

URL: https://openreview.net/forum?id=a8rUQqionr

---

Title: Feedback-Driven Black-Box Safety Alignment Testing of Large Language Models via Reinforcement Learning

Authors: Xuan Chen, Yuzhou Nie, Lu Yan, Mingwei Zheng, Yunshu Mao, Wenbo Guo, Xiangyu Zhang

Abstract: Large language models (LLMs) are equipped with safety alignment mechanisms to reduce harmful outputs, while systematically evaluating the effectiveness of these safeguards remains challenging.
Existing methods mainly rely on manually curated prompts or stochastic mutation-based search, which provide limited exploration efficiency.
We propose SEAT-RL, a feedback-driven black-box framework that uses deep reinforcement learning (DRL) to generate adversarial prompts against safety-aligned LLMs.
We formulate prompt generation as a sequential decision-making problem, where an agent iteratively refines prompts based on target model feedback.
To improve effectiveness and efficiency, we design (1) an LLM-facilitated action space that enables diverse yet constrained prompt transformations, and (2) a dense, automated reward function to guide exploration toward safety violations.
The learned policy is reusable and transfers across target models without retraining.
Experiments on six representative LLMs show that SEAT-RL discovers substantially more safety failures under the same query budget than existing automated baselines, such as the stochastic search methods powered by genetic algorithms.
SEAT-RL also exhibits stronger stability, cross-model transferability, and robustness against multiple defense mechanisms.
Ablation studies further validate the key design. These results suggest that RL provides an effective framework for black-box red-teaming evaluation of LLM safety alignment.

URL: https://openreview.net/forum?id=GWslY31w2b

---

Title: What is a Number, That a Large Language Model May Know It?

Authors: Raja Marjieh, Veniamin Veselovsky, Thomas L. Griffiths, Ilia Sucholutsky

Abstract: Numbers are a basic part of how humans represent and describe the world around them. As a consequence, learning effective representations of numbers is critical for the success of large language models as they become more integrated into everyday decisions. However, these models face a challenge: depending on context, the same sequence of digit tokens, e.g., 911, can be treated as a number or as a string. What kind of representations arise from this duality, and what are its downstream implications? Using a similarity-based prompting technique from cognitive science, we show that LLMs learn representational spaces that blend string-like and numerical representations. In particular, we show that elicited similarity judgments from these models over integer pairs can be captured by a combination of Levenshtein edit distance and numerical Log-Linear distance, suggesting an entangled representation. In a series of experiments we show how this entanglement is reflected in the latent embeddings, how it can be reduced but not entirely eliminated by context, and how it can propagate into a realistic decision scenario. These results shed light on a representational tension in transformer models that must learn what a number is from text input.

URL: https://openreview.net/forum?id=dnXulFHKbw

---

Title: Unlocking The Power Of Layer-By-Layer Training And Fine- Tuning

Authors: Liron Gelbard, Shay Landis, David Yannai, Assaf Touboul

Abstract: Layer-wise (LW) and segmented training reduce memory by restricting gradient propagation, but often suffer convergence degradation. We propose \emph{Segmented Propagation (SegProp)}, which keeps a small, trainable \emph{global head} (final layers + task head) active on the loss path throughout training, while updating only the current segment plus this shared head at each stage. This induces depth-wise gradient sparsity and reduces peak activation/optimizer footprint. Empirically, SegProp substantially closes the LW vs. End-to-End (E2E) gap on ResNet-18/50 for CIFAR-10 and achieves competitive performance under harder ImageNet-scale training with ViT, quantifying a clear accuracy--time--memory frontier as global-head depth and segmentation granularity vary. We further provide a system-level feasibility study on LLaMA-70B with 8$\times$40\,GiB GPUs, showing that SegProp enables larger feasible batches than FSDP with CPU offload and characterizing the resulting compute--memory trade-off via a detailed FLOPs analysis. Finally, we show that, in the evaluated 7--12B fine-tuning setups, SegProp matches or nearly matches end-to-end fine-tuning across downstream evaluations.

URL: https://openreview.net/forum?id=p5ObETPuTi

---

Title: Theoretical Foundations of Continual Learning via Drift-Plus-Penalty

Authors: Nazreen Shah, Govinda Arya, Bharath B N, Ranjitha Prasad

Abstract: In many real-world settings, data streams are inherently nonstationary and arrive sequentially, necessitating learning systems to adapt continuously without repeatedly retraining from scratch. Continual learning (CL) addresses this setting by seeking to incorporate new tasks while preventing catastrophic forgetting, whereby updates for recent data induce performance degradation on previously acquired knowledge. We introduce a control-theoretic perspective on CL that explicitly regulates the temporal evolution of forgetting, framing adaptation to new tasks as a controlled process subject to long-term stability constraints. We focus on replay-based CL settings in which a finite memory buffer preserves representative samples from prior tasks, allowing forgetting to be explicitly regulated. We propose COntinual Learning with Drift-Plus-Penalty (\texttt{COLD}), a novel continual learning framework based on the Drift-Plus-Penalty (DPP) principle from stochastic optimization. To facilitate theoretical and empirical analysis, we also consider an oracle variant, \texttt{COLD-ORACLE}, which serves as a reference benchmark. At each task, \texttt{COLD} and \texttt{COLD-ORACLE} minimize the instantaneous penalty corresponding to the current task loss while simultaneously maintaining a virtual queue that explicitly tracks deviations from long-term stability on previously learned tasks, hence capturing the stability–plasticity trade-off as a regulated dynamical process. We establish stability and convergence guarantees that characterize this trade-off, as governed by a tunable control parameter. Empirical results on standard benchmark datasets show that the proposed framework consistently achieves superior accuracy compared to a wide range of state-of-the-art CL baselines, while exhibiting competitive and tunable forgetting behavior that reflects the explicit regulation of the stability–plasticity trade-off through virtual queues and the DPP objective.

URL: https://openreview.net/forum?id=QhxNMdhhBy

---

Title: TropNNC: Structured Neural Network Compression Using Tropical Geometry

Authors: Konstantinos Fotopoulos, Petros Maragos, Panagiotis Misiakos

Abstract: We present TropNNC, a framework for compressing neural networks with linear and convolutional layers and ReLU-type activations using tropical geometry. By representing a network’s output as a tropical rational function, TropNNC enables structured compression via reduction of the corresponding tropical polynomials. Our method identifies redundancy via similarity and improves upon the geometric approximation of previous work by adaptively selecting the weights of retained neurons. We relate it to SVD and spectral clustering, and provide insights into network compression beyond the specific setting considered. We provide the tightest known theoretical compression bound, and the first successful application of tropical geometry to convolutional layers. TropNNC requires access only to network weights -- no training data -- and achieves competitive performance on MNIST, CIFAR, and ImageNet, matching strong baselines such as ThiNet and CUP.

URL: https://openreview.net/forum?id=u7DRq1icmY

---

Title: Dynamic guessing for Hamiltonian Monte Carlo with embedded numerical root-finding

Authors: Teddy Groves, Nicholas Luke Cowie, Lars Keld Nielsen

Abstract: It is possible to fit Bayesian statistical models whose parameters satisfy analytically intractable algebraic conditions by embedding a differentiable numerical root-finder inside a gradient-based sampling algorithm like Hamiltonian Monte Carlo. This technique has enabled important scientific breakthroughs, but is limited by the high computational cost of computing and differentiating large numbers of numerical solutions. We show that dynamically varying the starting guess within a Hamiltonian trajectory can improve performance. To choose a good guess we propose two heuristics: guess-previous reuses the previous solution as the guess and guess-implicit extrapolates the previous solution using implicit differentiation. We benchmark these heuristics on a range of representative models. We also present a JAX-based Python package providing easy access to a performant sampler augmented with dynamic guessing.

URL: https://openreview.net/forum?id=z4PfNDNAcN

---

Title: One Rank at a Time: Cascading Error Dynamics in Sequential Learning

Authors: Mahtab Alizadeh Vandchali, Fangshuo Liao, Anastasios Kyrillidis

Abstract: Sequential learning --where complex tasks are broken down into simpler, hierarchical components-- has emerged as a paradigm in AI.
This paper views sequential learning through the lens of low-rank linear regression, focusing specifically on how errors propagate when learning rank-1 subspaces sequentially. We present an analysis framework that decomposes the learning process into a series of rank-1 estimation problems, where each subsequent estimation depends on the accuracy of previous steps. Our aim is explanatory rather than comparative: we analyze error propagation and derive compute allocation guidance without claiming superiority over joint or one-shot training. Our contribution is a characterization of the error propagation in this sequential process, establishing bounds on how errors --e.g., due to limited computational budgets and finite precision-- affect the overall model accuracy. We prove that these errors compound in predictable ways, with implications for both algorithmic design and stability guarantees. Code is available at: https://github.com/MahiAV/ORAT.

URL: https://openreview.net/forum?id=EG7XJANxhX

---

Title: On the Convergence of Adam-Type Algorithm for Bilevel Optimization under Unbounded Smoothness

Authors: Xiaochuan Gong, Jie Hao, Mingrui Liu

Abstract: Adam has become one of the most popular optimizers for training modern deep neural networks, such as transformers. However, its applicability is largely restricted to single-level optimization problems. In this paper, we aim to extend vanilla Adam to tackle bilevel optimization problems, which have important applications in machine learning, such as meta-learning. In particular, we study stochastic bilevel optimization problems where the lower-level function is strongly convex and the upper-level objective is nonconvex with potentially unbounded smoothness. This unbounded smooth objective function covers a broad class of neural networks, including transformers, which may exhibit non-Lipschitz gradients. In this work, we introduce AdamBO, a single-loop Adam-type method that achieves $\widetilde{O}(\epsilon^{-4})$ oracle complexity to find $\epsilon$-stationary points, where the oracle calls involve stochastic gradient or Hessian/Jacobian-vector product evaluations. The key to our analysis is a novel randomness decoupling lemma that provides refined control over the lower-level variable. We conduct extensive experiments on various machine learning tasks involving bilevel formulations with recurrent neural networks (RNNs) and transformers, demonstrating the effectiveness of our proposed Adam-type algorithm.

URL: https://openreview.net/forum?id=cPnmtVnhk4

---

Title: Replicability is Asymptotically Free in Multi-armed Bandits

Authors: Junpei Komiyama, Shinji Ito, Yuichi Yoshida, Souta Koshino

Abstract: We consider a replicable stochastic multi-armed bandit algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset. Replicability allows third parties to reproduce published findings and assists the original researcher in applying standard statistical tests. We observe that existing algorithms require $O(K^2/\rho^2)$ times more regret than nonreplicable algorithms, where $K$ is the number of arms and $\rho$ is the level of nonreplication. However, we demonstrate that this additional cost is unnecessary when the time horizon $T$ is sufficiently large for a given $K, \rho$, provided that the magnitude of the confidence bounds is chosen carefully. Therefore, for a large $T$, our algorithm only requires $K^2/\rho^2$ times smaller amount of exploration than existing algorithms. To ensure the replicability of the proposed algorithms, we incorporate randomness into their decision-making processes. We propose a principled approach to limiting the probability of nonreplication. This approach elucidates the steps that existing research has implicitly followed. Furthermore, we derive the first lower bound for the two-armed replicable bandit problem, which implies the optimality of the proposed algorithms up to a $\log\log T$ factor for the two-armed case.

URL: https://openreview.net/forum?id=E8rmbq8BYP

---

Title: NEUTAG: Graph Transformer for Attributed Graphs

Authors: Shubham Gupta, Sayan Ranu, Srikanta Bedathur

Abstract: Graph Transformers (\textsc{GT}) have demonstrated their superiority in graph classification tasks, but their performance in node classification settings remains below par. They are designed for either homophilic or heterophilic graphs and show poor scalability to million-sized graphs. In this paper, we address these limitations for node classification tasks by designing a model that utilizes a special feature encoding that transforms the input graph separating nodes and features, which enables the flow of information not only from the local neighborhood of a node but also from distant nodes, via their connections through shared feature nodes. We theoretically demonstrate that this design allows each node to exchange information with all nodes in the graph, effectively mimicking all-node-pair message passing without requiring dense attention between all node pairs. This enables scalability for large attributed graphs when the number of features is substantially smaller than the number of nodes. We further analyze the universal approximation ability of the proposed transformer. Finally, we demonstrate the effectiveness of the proposed method on diverse sets of large-scale graphs, including the homophilic \& the heterophilic varieties.

URL: https://openreview.net/forum?id=kQrIrYvbbw

---

New submissions
===============

Title: Dynamically Scaled Activation Steering

Abstract: Activation steering has emerged as a powerful method for guiding the behavior of generative models towards desired outcomes such as toxicity mitigation. However, most existing methods apply interventions uniformly across all inputs, degrading model performance when steering is unnecessary. We introduce Dynamically Scaled Activation Steering (DSAS), a method-agnostic steering framework that decouples when to steer from how to steer. DSAS adaptively modulates the strength of existing steering transformations across layers and inputs, intervening strongly only when undesired behavior is detected. At generation time, DSAS computes context-dependent scaling factors that selectively adjust the strength of any steering method. We also show how DSAS can be jointly optimized end-to-end together with the steering function. When combined with existing steering methods, DSAS consistently improves the Pareto front with respect to steering alone, achieving a better trade-off between toxicity mitigation and utility preservation. We further demonstrate DSAS’s generality by applying it to a text-to-image diffusion model, showing how adaptive steering allows the modulation of specific concepts. Finally, DSAS introduces minimal computational overhead while improving interpretability, pinpointing which tokens require steering and by how much.

URL: https://openreview.net/forum?id=EXKv1KMNtR

---

Title: Objective-Behavior Alignment: Diagnostics for MORL Policy Selection

Abstract: Real-world decision-making often requires optimizing multiple competing objectives simultaneously. In reinforcement learning (RL), this is typically addressed by combining reward signals into a single scalar objective via a scalarization function, which can be fragile: small changes in the weights can induce drastically different policies. Multi-objective reinforcement learning (MORL) instead produces sets of policies that explicitly represent trade-offs between objectives. However, these policies are typically presented to the decision maker only through their value vectors, which can obscure substantial behavioral variation: policies that induce distinct trajectories may appear indistinguishable when evaluated solely by expected returns. We propose an exploratory diagnostic workflow that automatically highlights behavioral variation along the Pareto front that objective values alone do not reveal, providing both quantitative and visual tools to support policy inspection. We validate our approach on simple grid examples and scale it to continuous control benchmarks, demonstrating that it remains effective as problem complexity increases.

URL: https://openreview.net/forum?id=hfnMLNCCYz

---

Title: Mitigating Social Desirability Bias in Random Silicon Sampling

Abstract: Large Language Models (LLMs) are increasingly used to simulate population responses, a method known as ``Silicon Sampling''.
However, responses to socially sensitive questions frequently exhibit social desirability bias, diverging from real human data toward socially desirable answers. Existing studies on social desirability bias in LLM-based sampling remain limited. In this work, we investigate whether minimal, psychologically grounded prompt wording can mitigate this bias and improve alignment between silicon and human samples.
We conducted a study using data from the American National Election Study (ANES) on three LLMs from two model families: the open-source Llama-3.1 series and GPT-4.1-mini.
We first replicate a baseline silicon sampling study, confirming the persistent social desirability bias.
We then test four prompt-based mitigation methods: \emph{reformulated} (neutral, third-person phrasing), \emph{reverse-coded} (semantic inversion), and two meta-instructions, \emph{priming} and \emph{preamble}, respectively encouraging analytics and sincerity. Alignment with ANES is evaluated using Jensen-Shannon Divergence with bootstrap confidence intervals.
Our results demonstrate that reformulated prompts most effectively improve alignment by reducing distribution concentration on socially desirable answers and achieving distributions closer to ANES. Reverse-coding produced mixed results across eligible items, while the Priming and Preamble encouraged response uniformity and showed no systematic benefit for bias mitigation.
We further show that reformulation remains effective under temporal shifts in the survey population and transfers to a different survey instrument and to populations outside the U.S.
Our findings validate the efficacy of prompt-based framing controls in mitigating inherent social desirability bias in LLMs, providing a practical path toward more representative silicon samples.

URL: https://openreview.net/forum?id=1njhuA8B3r

---

Title: SHED Light on Segmentation for Dense Prediction

Abstract: Dense prediction infers per-pixel values from a single image and underlies 3D perception and robotics. Real-world scenes are highly structured, yet most models predict each pixel independently, producing blurred boundaries and geometrically inconsistent surfaces. We propose SHED, an encoder-decoder that builds an explicit hierarchy of image segments and then decodes it. The encoder groups pixels into progressively coarser segment tokens, and the decoder inverts this grouping, unpooling coarse tokens back to fine ones through the learned assignments rather than generic feature upsampling. The hierarchy is trained end-to-end from the dense prediction objective alone, without any segmentation labels. On monocular depth, SHED sharpens occlusion boundaries and intra-object coherence, transfers strongly from synthetic to real domains, and uses less computation than the comparable DPT baseline. The learned hierarchy also organizes features by scene layout, improving semantic alignment and 3D reconstruction. Applying the same mechanism to optical flow improves a strong baseline and surpasses a recent state-of-the-art method, indicating that decoding a learned segment hierarchy is a general principle for dense prediction.

URL: https://openreview.net/forum?id=gsRlLeHiiI

---

Title: When Consensus Is Not Correctness: Diversity Collapse and Manufactured Overconfidence in Multi-Agent LLM Debate

Abstract: Multi-agent large language model (LLM) debate is widely believed to improve answers, and agreement is routinely read as evidence of trustworthiness. We show that debate transforms agreement from evidence into an outcome: agreement is endogenous to the interaction that produces it. A variance account, tied to measured diversity collapse, makes this precise. As agents read one another, the inter-agent correlation rises toward one. That same correlation controls both the panel's error and the disagreement an operator reads as confidence, driving them in opposite directions: the ensemble stops averaging out error exactly as it stops looking uncertain. Three consequences follow. Apparent confidence saturates independently of error. The central empirical signature is not the identity G = C - A = R - (1 - C), but the collapse-induced flattening of the confidence shortfall 1 - C: terminal confidence has 17x smaller variance than accuracy, and one shared shortfall predicts per-condition gaps out of sample. Consistent with this signature, the induced gap-residual regression is affine on the primary model, with slope 0.82 and R² = 0.96, and remains monotone with sub-unit slopes across model-family probes. Whether debate is benign is then a race between error correction and confidence inflation, governed by role design and task headroom. We introduce Calibrated Multi-Agent Debate, a certification-first framework with two conditional levers, Prevent and Detect, and a split-conformal certificate, Certify. With exchangeable labeled calibration, Certify controls set coverage under collapse, at the cost of larger sets or abstention on hard cases, while agreement-based stopping commits confident errors at 18-47% miscoverage. Matched self-critique and verdict-injection controls separate interaction-driven amplification from baseline model overconfidence.

URL: https://openreview.net/forum?id=lWCLnGrHhH

---

Title: Revisiting Generalization Measures Beyond IID: How Image Corruption and Perturbation Affect Robustness of Generalization Measures

Abstract: Predicting generalization from quantities available before target-test evaluation remains a central challenge in deep learning. The systematic benchmark of Jiang et al. (2020) evaluated many generalization measures, but it focused on independent and identically distributed (IID) settings. We revisit this problem for image classifiers evaluated under controlled corruptions and perturbations. Our study uses CIFAR-10-C/P, where the label space and task remain fixed while the input images are degraded or perturbed. This setting also allows us to revisit the robustness concerns raised by Dziugaite et al. (2020), who showed that the apparent reliability of generalization measures can depend strongly on experimental conditions. Our experiments show that the usefulness of generalization measures is strongly regime-dependent. When the selector must commit to a measure family before target-test evaluation, Calibration & Confidence provides the most favorable family-level downside profile in our CIFAR-10-C/P protocol, achieving the lowest normalized-regret point estimate and the highest top-20% hit rate among non-oracle families. Optimization-based measures, Information Criteria, and Sharpness-based measures provide additional regime-dependent signals in correlation or local-reliability analyses. Together, these findings suggest that model selection should not rely only on measures favored by IID evaluation. Instead, generalization measures should be treated as regime-dependent ranking signals and validated on a target-like proxy for the expected corruption or perturbation setting.

URL: https://openreview.net/forum?id=X4RoujAYnY

---

Title: GPTNT: Benchmarking Real-Time Collaboration Between Multimodal Agents on Keep Talking And Nobody Explodes

Abstract: Multimodal models are increasingly deployed to solve tasks collaboratively with humans or other artificial agents. While existing benchmarks show that they possess the fundamental capabilities, the various conditions that coincide when collaborating—time pressure, information asymmetry, and imperfect communication—have traditionally been studied in isolation.

To address this gap, we introduce GPTNT, a benchmark built on the cooperative video game Keep Talking and Nobody Explodes, in which two agents must coordinate to defuse procedurally generated bomb puzzles against a live countdown. One agent has access to the bomb but not the instructions for defusing it; the other holds the instructions but cannot see or manipulate the bomb. Neither agent can succeed alone: the task requires contributions from both, and is solvable only through effective, efficient communication. We remove turn-taking proxies or simplifications, instead requiring agents to act asynchronously and communicate in real time.

GPTNT is designed to separate collaboration from relying on memorised solutions: the instruction manual, the partner, or both, can optionally be withheld to isolate what a model derives in the moment from what it already knows. We demonstrate that GPTNT poses a considerable challenge to the state-of-the-art: not one of the closed- and open-source models we test defuses a single bomb in real time, a bar that human players clear. In a range of controlled experiments, we explore where capabilities break down, identifying critical weaknesses in state tracking, efficient acting within the time budget, handling ambiguity, and error recovery.

We release GPTNT as a means for testing the collaborative performance that current benchmarks leave unmeasured. Since it runs on the real game, GPTNT benefits from procedural generation and inherits a living modding community: as models improve, the benchmark can be evolved to remain challenging, rather than being solved once and retired.

URL: https://openreview.net/forum?id=BXJlDltbtq

---

Title: FORGE: Forward-Only Test-Time Adaptation for Integer-Only Vision Models on Microcontrollers

Abstract: Vision models deployed on microcontrollers (MCUs) are quantized to integer-only arithmetic, which removes the ability to run backpropagation: the standard tool for adapting a model to the distribution shift (sensor noise, blur, lighting) it meets in the field. Existing
forward-only test-time adaptation (TTA) methods either run only on server- or edge-GPU-class models (not true microcontroller integer execution), or require the batch-normalization (BN) layers that integer deployment fuses away. We present a forward-only TTA method
that operates on deployed, BN-folded, integer-only convolutional networks. The key observation is that fusing BN into the preceding convolution, a mandatory step for integer inference, destroys the statistics that normalization-based adaptation relies on. We restore adaptation by re-normalizing each folded convolution’s per-channel output to its clean training statistics, using only forward-pass estimates. The method (i) recovers most of gradient-based TENT’s accuracy gain (+20.9 vs. +24.9 points) and matches forward-only BN adaptation, while being the only method that runs on a folded integer-only model; (ii) needs to adapt only 3 of 21 layers (selected without seeing the test corruptions) to recover 93% of the benefit; (iii) survives single-sample streaming with a batch-size-scaled momentum; and (iv) generalizes across three datasets (up to 200 classes) and two architectures. We validate true integer-only execution and deploy on an ESP32-S3, where, measured with a Nordic PPK2 power profiler, adaptation costs only 8.3 mJ (6.8% of inference energy) and
21.9 ms on the deployed SIMD-optimized model: forward-only adaptation is cheap on a real microcontroller.

URL: https://openreview.net/forum?id=A45I5p25dd

---

Title: EpiLoop: A Diagnostic Protocol for Measuring Epistemic Lock-in in Iterative Search Agents

Abstract: Iterative search agents repeatedly retrieve evidence, update context, and decide whether to keep searching or to answer. Existing evaluations emphasize final-answer accuracy or retrieval quality, but the agent's evolving epistemic state remains implicit. We study \emph{epistemic lock-in}: a reliability failure in which an early hypothesis shapes later queries, evidence selection, confidence estimates, and stopping decisions, so that additional retrieval reinforces rather than revises the answer. We introduce a diagnostic protocol that measures lock-in through observable trajectory proxies, with Hypothesis Change Rate (HCR) as the primary measure of hypothesis instability. We also present EpiLoop, a lightweight inference-time actionability probe that tracks hypothesis, confidence, source coverage, closure judgment, and refutation intent. In a controlled HoVer-derived study, EpiLoop matches equal-budget forced extra search in final accuracy (75.0\% vs. 75.0\%, paired bootstrap 95\% CI [-0.100, +0.100]) while producing lower HCR (0.083 vs. 0.121). Cross-model checks and external validation slices on FEVER, SciFact, and VitaminC show the same qualitative pattern: HCR decreases, while accuracy effects remain dataset- and parser-dependent. These results do not show that EpiLoop is more accurate than extra search. Rather, they show that explicit epistemic-state tracking exposes and reduces a trajectory-level instability that final-answer accuracy alone misses.

URL: https://openreview.net/forum?id=hMptycsA60

---

Title: AesthetiX-RAG: Causally-Grounded Emotion Recognition and Explanation in Paintings via Artist–Style Knowledge and Faithful Visual Evidence

Abstract: Art provides a visual medium for emotional expression. In paintings, such expression is conveyed through compositional structure, symbolic elements, and stylistic features. However, existing computational methods for understanding artwork often leverage semantic content and low-level visual features. Consequently, these methods may provide a limited representation of emotional expression embedded in stylistic and compositional features. In this work, we present AesthetiX-RAG, a causally grounded retrieval-augmented framework for emotion recognition and explanation in paintings. The proposed framework employs an Artist–Style–Motif–Emotion (ASME) graph to model relationships among artists, stylistic traditions, symbolic motifs, and emotional expression. The artist–style priors derived from ASME are projected into control tokens and fused with visual representations through MultiHead Attention. The fused representation is used to predict the emotion label. Finally, the retrieval-augmented generator combines the emotion label with faithful visual evidence and retrieved artist–style knowledge to generate a grounded natural-language explanation for the predicted emotion. We also introduce a new dataset AesthetiX-5K, to support emotion recognition and explanation in paintings. The dataset contains 5116 paintings comprising 27 artistic styles, 23 artists, and 10 genres, with each sample annotated with an emotion label and a human-written rationale. Detailed experimental analysis on AesthetiX-5K and existing art-emotion datasets validates the effectiveness of the proposed framework. The code and dataset will be made publicly available.

URL: https://openreview.net/forum?id=eFVnTq7w8v

---

Title: MCIR: A Feature Dependence-Aware Explainability Method with Reliability Guarantees

Abstract: As modern machine learning models are deployed in high-stakes, data-rich environments,
the interactions among features have grown more intricate and less amenable to traditional
interpretation. Many explanation methods fail when features are strongly dependent. In
the presence of multicollinearity or near-duplicate predictors, existing value attribution
tools such as SHAP, LIME, HSIC, MI/CMI, and SAGE often distribute importance across
redundant features, obscuring which variables represent "important and unique information."
This may lead to unstable rankings, jeopardising importance scores, and usually results in a
high computational cost. Recent correlation-aware approaches, such as CIR or BlockCIR,
offer partial improvements, but still struggle to fully separate redundancy from unique
contributions at the feature level. To address this, we propose the Mutual Correlation Impact
Ratio Method (MCIR-M), a dependence-aware global feature-importance procedure that
quantifies the unique information contributed by each feature beyond its correlated neighbours.
MCIR-M introduces the score Mutual Correlation Impact Ratio (MCIR) that conditions each
feature on a small set of its most correlated neighbours and computes a normalized ratio of
conditional information having a value range, which is comparable across tasks, and collapses
to zero when a feature is redundant, enabling clear redundancy detection. In addition to
MCIR, we introduce a lightweight estimation procedure for computing MCIR scores using
only a fraction of the available data while preserving the attribution behaviour of the full
model. Across a synthetic household-energy dataset and the real UCI HAR benchmark,
MCIR yields more stable and dependence-aware rankings than SHAP (independent and
conditional), SAGE, HSIC, MI-based scores, and correlation-aware baselines such as CIR or
BlockCIR. Lightweight explanations reduce runtime manifold and preserve over 95% topfeature agreement in the synthetic benchmark setting while maintaining moderate overlap
on the more challenging HAR dataset. These results demonstrate that MCIR-M provides
a practical and scalable solution for global explanation in settings with strong feature
dependence.

URL: https://openreview.net/forum?id=nRQhWIzaM5

---

Title: Zoom-Zero: Coarse-to-Fine Video Understanding with Token-Selective Optimization

Abstract: Grounded video question answering (GVQA) aims to localize relevant temporal segments in videos and generate accurate answers to a given question; however, large video-language models (LVLMs) exhibit limited temporal awareness. Although existing approaches based on Group Relative Policy Optimization (GRPO) attempt to improve temporal grounding, they still struggle to faithfully ground their answers in the relevant video evidence, leading to temporal mislocalization and hallucinations. In this work, we present \textbf{Zoom-Zero}, a coarse-to-fine framework that first localizes query-relevant segments and then temporally zooms into the most salient frames for finer-grained visual verification. Our method addresses the limits of GRPO for the GVQA task with \textit{two distinct contributions} beyond prior work: \textbf{(i)} frame saliency self-verification, which validates the fidelity of temporal grounding predictions via fine-grained visual checks on the grounded frames; \textbf{(ii)} token-selective credit assignment, which attributes credit to the tokens responsible for temporal localization or answer generation, mitigating GRPO’s issue in handling multi-faceted reward signals. Our proposed method advances grounded video question answering, improving temporal grounding by 5.2\% on NExT-GQA and 4.6\% on ReXTime, while also enhancing average answer accuracy by 2.4\%. Additionally, the coarse-to-fine zoom-in during inference further benefits long-form video understanding by preserving critical visual details without compromising global context, yielding an average improvement of 6.4\% on long-video benchmarks. Our code will be publicly available.

URL: https://openreview.net/forum?id=5dOMrHbkj5

---

Title: Tags for DAGs: Graph Refinement with Meta-Informed Relations

Abstract: Causal discovery has shifted from data-centric methods to hybrid strategies that integrate semantic knowledge from experts or large language models (LLMs). Such external information is vital for identifying causal structures beyond the Markov Equivalence Class (MEC), which data alone cannot resolve. However, expert availability is often limited, and LLMs frequently misidentify causal directions in specialized domains. To overcome such shortcomings, we propose a tag-based approach that leverages semantically meaningful labels while deriving causal directionality directly from data. Using variable-level tag assignments from available sources (e.g., LLMs), our tags for DAGs method learns from identifiable data structures to extract higher-level causal relations. These are then used to orient undirected edges, enabling causal discovery to move beyond the MEC without reliance on fallible external knowledge.

URL: https://openreview.net/forum?id=a399gBzXic

---

Title: Below the Reliability Floor: Recovering True Success from Judge-Gated Loops

Abstract: LLM judges are increasingly placed inside an agent's loop, scoring the agent's own attempts and re-prompting until one passes. We show this quietly corrupts measurement: retry-until-PASS is optional stopping against a noisy classifier—it keeps drawing until the judge slips—so the reported pass rate is an upward-biased estimator of true success. We make this exact. The cap-$K$ gate is a binary classifier with closed-form sensitivity/specificity, and its bias is governed by one coefficient, the gate's Youden index $J$: as $J \to 0$ the gated rate becomes uninformative about $\pi$, so recovery must fall back to gold labels and no estimator beats the gold-only mean. Across $44$ capable-agent loops on GSM8K, MATH, and code with objective ground truth (no authored weakness; a separate terse-agent stress set is excluded here) the inflation is systematic (median slip $+0.16$; worst on code, where the judge cannot run the candidate, a true $0.74$ inflated to a reported $0.98$) and obeys a closed-form law predicting the slip from per-attempt statistics (pooled $r=0.95$; errors-in-variables slope $0.765\,[0.70,0.84]$, excluding $1$). To recover true success we benchmark Rogan–Gladen against prediction-powered inference: PPI++ dominates in aggregate (mean recovery MAE $0.050$ vs. $0.149$, and $0.081$ vs. $0.241$ as gold becomes scarce; the advantage concentrates in the high-bias regime, while per-gate differences on balanced gates are within noise), because it escapes the $1/J^2$ variance that makes the classical correction fragile. Beyond verifiable gold, we measure recovery on public, human-labeled non-verifiable gates—response safety and summary quality—where PPI++ recovers the true rate to mean MAE $0.043$ versus naive $0.165$ ($\sim 4\times$ in aggregate, up to $>10\times$ on the most-biased gates); and—in the motivating regime, a non-verifiable safety judge inside a real retry loop ($n=400$, against a pre-registered 3-model strong-LLM panel—a disclosed proxy, human-anchored at raw $0.90$ agreement)—a lenient gate ships $6.8\%$ truly-unsafe responses (95% CI $[0.05,0.10]$) while the calibrated correction recovers the panel safe-rate $\sim 3.5\times$ more accurately than naive. The deliverable is a recipe: report a PPI++ estimate alongside $J$ as a reliability/identifiability diagnostic, measure at $K=1$, and—via a label-free drift detector (ROC-AUC $0.80$)—de-bias only when calibration transfers. We release all code and content-free data.

URL: https://openreview.net/forum?id=J2Yg9vJcYb

---

Title: Agentic Reasoning for Large Language Models: A Survey

Abstract: Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, exemplified by standard benchmarks in mathematics and code, they struggle in open-ended and dynamic environments. The emergence of agentic reasoning marks a paradigm shift, bridging thought and action by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction. In this survey, we provide a systematic roadmap by organizing agentic reasoning along three complementary dimensions. First, we characterize environmental dynamics through three layers: foundational agentic reasoning establishes core single-agent capabilities, including planning, tool use, and search, that operate in stable environments; self-evolving agentic reasoning examines how agents refine these capabilities through feedback, memory, and adaptation in evolving settings; and collective multi-agent reasoning extends intelligence to collaborative scenarios where multiple agents coordinate roles, share knowledge, and pursue shared goals. Across all layers, we analyze system constraints and optimization settings by distinguishing in-context reasoning, which scales test-time interaction through structured orchestration and adaptive workflow design, from post-training reasoning, which optimizes behaviors through reinforcement learning and supervised fine-tuning. We further review and contextualize agentic reasoning frameworks in real-world applications and benchmarks spanning science, robotics, healthcare, autonomous research, and math, illustrating how different reasoning mechanisms are instantiated and evaluated across domains. This survey synthesizes agentic reasoning methods into a unified roadmap that bridges thoughts and actions, offering actionable guidance for agentic systems across environmental dynamics, optimization settings, and agent interaction settings. Finally, we outline open challenges and future directions, situating how agentic reasoning has developed while identifying what remains ahead: personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance frameworks for real-world deployment.

URL: https://openreview.net/forum?id=2HIxxXIq2u

---

Title: Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

Abstract: Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the average preference of all human evaluators, which, under large variations of preferences, can be unfair to minority groups. In this work, we consider fairness in dueling bandits, a standard framework for online learning from preference data. We assume that each user has a (potentially distinct) Condorcet winner, which is an arm preferred to every other arm. Using these user-specific Condorcet winners as reference points, we evaluate and score arms according to their performance relative to the corresponding winner. To promote fairness across heterogeneous users, we adopt the well-established Nash Social Welfare objective, which maximizes the product of user utilities, thereby inherently penalizing inequality and preventing the marginalization of any single user. Within this framework, we construct a hard instance to establish a regret lower bound of $\Omega(T^{2/3}\min(K,D)^\frac{1}{3})$ for a time horizon $T$, $K$ arms, and $D$ users, which, to the best of our knowledge, is the first result quantifying the cost of fairness in dueling bandits with heterogeneous preferences. We then present the Fair-Explore-Then-Commit and Fair-$\epsilon$-Greedy algorithms with a Condorcet winner identification phase. We further derive their regret upper bounds that match the lower-bound dependence on $T$ up to logarithmic factors.

URL: https://openreview.net/forum?id=131jUEYdMT

---

Title: Minimally Invasive Machine Unlearning via Posterior Control

Abstract: Approximate Machine Unlearning (MU) methods typically forget specific data by modifying model parameters or learning data-dependent augmentations. However, each unlearning request requires a dedicated optimization process, resulting in high computational overhead and cumulative degradation of model performance over time. Recent approaches have proposed manipulating a small subset of neural activations as a more targeted alternative, yet these methods still rely on on-demand searches for relevant parameters and remain computationally expensive. We propose Minimally Invasive Machine Unlearning (MIMU), a posterior control–based MU framework that enables unlearning without inference-time optimization. MIMU introduces a parameterized control module that maps unlearning requests to binary masking policies via forward propagation. The resulting masks identify the most influential parameters associated with the forget set, allowing their predictive influence to be effectively removed without additional optimization when an unlearning request is issued. Experimental results across multiple benchmarks demonstrate that MIMU achieves competitive unlearning performance while preserving the generalization ability and prediction patterns of the original model, offering a favorable trade-off between unlearning efficiency and effectiveness.

URL: https://openreview.net/forum?id=bVc04v8dV7

---

Title: MeGA-MP: Metric Graph Advection Message Passing

Abstract: Many real-world systems are organized as networks where spatio-temporal dynamics unfold along connections and not discretely between nodes. Examples include utility networks such as water distribution systems or gas networks, electrical grids, and traffic flow networks. Such systems are naturally modeled as metric graphs, where edges correspond to one-dimensional Euclidean subspaces connected at vertices. Metric graphs are independent of an underlying global Euclidean space, limiting direct application of typical PINNs and operator-learning methods. Especially transport dynamics like advection require a methodology able to capture antisymmetric and long-range dependencies on graphs, which is itself a challenge. We propose a novel physics-informed message passing operator that encodes linear advection on metric graphs as an inductive bias. In the purely advective setting, the operator provably recovers the exact dynamics up to a theoretically derived discretization error without any training. Combined with trainable components like MLPs, our message passing operator extends to realistic advection-reaction dynamics in water distribution systems, where we achieve superior performance compared to baselines and zero-shot generalization across different graph topologies.

URL: https://openreview.net/forum?id=2aZPFrKYYb

---

Title: Expanding SPHERE-JEPA: A Family of Statistical Regularizers for the Hypersphere

Abstract: In Self-Supervised Learning (SSL), preventing representation collapse by explicitly enforcing a uniform distribution on the unit hypersphere has proven to be effective.
However, current frameworks typically rely on \textit{sliced} statistical regularizers such as SIGReg (used in LeJEPA) and SUSReg (used in SPHERE-JEPA), which approximate this continuous objective via Monte Carlo sampling along random 1D directions.
This stochasticity injects projection variance into the training gradients, destabilizing optimization, and hindering convergence.
In this work, we first show that analytically integrating out these random projections natively yields a deterministic Maximum Mean Discrepancy (MMD), bypassing the variance of sliced methods.
Motivated by this equivalence, we formulate full-dimensional objectives for MMD, Kernel Stein Discrepancy (KSD), and Kullback–Leibler (KL) divergence directly on the sphere to enforce a uniform distribution.
To prevent spatial bias, we equip these tests with rotationally invariant kernels constructed via spectral theory, systematically evaluating two canonical families: smooth exponential decay (Heat) and strict frequency cutoff (Bandlimited) filters.
Empirically, removing projection-induced noise results in more stable optimization, faster convergence, and consistent improvements over stochastic sliced regularizers on ImageNet and Galaxy10.
Furthermore, we reveal that the choice of the statistical test shapes the geometry of the learned latent space: MMD and KSD favor locally clustered organization suitable for object-centric domains, whereas the continuous KDE-based KL divergence promotes fine-grained instance separation, yielding the strongest results on unclustered procedural texture retrieval.

URL: https://openreview.net/forum?id=h5YCYmTB3S

---

Title: Uncertainty-Aware Model-Agnostic Dynamic Feature Selection

Abstract: Dynamic feature selection (DFS) acquires features sequentially under a budget, making it attractive for cost-sensitive decisions. Existing methods train a predictor specifically for sequential acquisition and evaluate it mainly through accuracy--budget curves. This approach leaves two questions open that are significant for deployment: how to know whether the decision under partial observation is truly reliable, and how to deploy DFS in settings where the predictor is already validated or in production. We argue that a pre-trained model can be reused if the acquisition policy explicitly accounts for the subset-dependent uncertainty that partial observation creates. Our central claim is that a native DFS predictor is not always necessary, and that while some decisions with little information are inherently uncertain, it is possible to search for the ones that are more reliable for each budget. We first show that DFS introduces uncertainty sources absent from static prediction: adapting a model across feature subsets induces subset-dependent epistemic uncertainty, imputing missing features biases aleatoric estimates, and predictive entropy can fall even under a poor policy. We then propose an uncertainty-aware, model-agnostic framework that adapts pre-trained neural, tree-based, and rule-based classifiers through efficient subset reparametrisation, and penalises epistemically unstable acquisitions using auxiliary predictors. Experiments show the framework is competitive with and sometimes better than state-of-the-art greedy and reinforcement-learning DFS baselines on tabular and image data. Beyond accuracy, they also show that calibration does not necessarily improve as more features are acquired, and that incorporating epistemic uncertainty provides a useful reliability-control signal for partial-observation decisions.

URL: https://openreview.net/forum?id=KlzRVlM7ol

---

Title: Ricci-Filtration: Boosting Retrieval-Augmented Generation Reranking for Question-Answering Tasks with Discrete Ricci Flow

Abstract: Ricci flow is a curvature-guided diffusion process that deforms space by shrinking regions of high positive curvature and expanding those with negative curvature. Similarly, discrete Ricci flow on weighted graphs modifies edge weights by shrinking edges with positive Ricci curvature and stretching those with negative Ricci curvature, effectively increasing the separation between clusters. Inspired by these two cornerstone works,
we propose a geometry-based RAG reranker enhancement procedure called Ricci-Filtration. By modeling the input query and initial retrieved chunks as a network, where the input query and chunks serve as nodes and embedding-based pairwise relations define an initial graph, Ricci-Filtration leverages discrete curvature and Ricci flow to evaluate the structural importance of each chunk with respect to the user query. The system first filters the initial chunks based on their geometric curvature relative to the query; then, a reranker processes the remaining chunks to enhance generative performance. We provide a stylized theoretical analysis showing that normalized discrete Ricci flow can separate edge types on idealized community graphs, offering support for the post-flow filtering mechanism while not implying guarantees on arbitrary embedding-derived retrieval graphs. Experiments across QA benchmarks show that Ricci-Filtration improves several settings, especially SQuADv2 and selected MultiHop-RAG query types, while also revealing limitations on harder connected multi-hop reasoning tasks. Ablation studies characterize sensitivity to graph-construction thresholds, flow iterations, embeddings, rerankers, and a simple K-means filtering baseline.

URL: https://openreview.net/forum?id=iCebN53mSX

---

Title: Information-Tight Value-Loss Guarantees for Test-Time Committees in Cooperative MARL

Abstract: Cooperative multi-agent reinforcement learning (MARL) deployments increasingly spend test-time compute through committees of policy checkpoints, seeds, or ensemble advisors that vote on each agent's action. We study how to certify the team value-loss of such a frozen agreement-gated committee controller relative to a fixed reference policy $\pi^{\mathrm{ref}}$, using only deployment-time observable information. This is a certification problem for a frozen controller, not policy learning. We first show that per-agent marginal certification is invalid: its under-estimation compounds linearly with team size and disappears at $n = 1$, so the obstruction is genuinely multi-agent. A sequential counterexample then shows that a reference-prefix telescoping bound can strictly under-estimate the true loss; validity requires a joint occupancy-weighted certificate. Our main result is a range-aware information characterization. The finite-horizon return range supplies an information-independent ceiling $R_{\max} = H \Delta_r$, while deployment observables induce a chain of information terms $C_0 \ge C_1 \ge C_2$ over three nested information sets $I_0 \preceq I_1 \preceq I_2$. The unconditional guarantee is the chain $L \le R_{\max} \wedge C_2$, $R_{\max} \wedge C_2 \le R_{\max} \wedge C_1$, and $R_{\max} \wedge C_1 \le R_{\max} \wedge C_0$, where $L = J(\pi^{\mathrm{ref}}) - J(\pi^{\mathrm{ctrl}}_N)$ is the team value-loss. In the clean endorsement regime ($\eta = 0$), we establish profile-relative optimality over an explicit constructive witness class, together with pointwise sharpness of the pre-cap coordinate-local terms over all admissible unit laws. The carrier uses only the failure probability $g$, so it is agnostic to the committee's internal dependence structure and covers arbitrarily correlated advisors; logging the executed fallback action identity is what moves the worst-action certificate $C_1$ to the tighter logged-fallback certificate $C_2$. We then turn $C_2$ into a fresh-rollout, distribution-free $1 - \delta$ certificate with an explicit conservative value-bound construction, and a matching rare-unit lower bound. Exact cooperative Markov games verify validity and tightness against dynamic-programming truth, a conservative rollout-bridge experiment demonstrates valid certification under conservative rollout value bounds, and a tabular over-dispersion experiment confirms that a binomial plug-in under-covers on correlated committees while the dependence-agnostic certificate stays valid.

URL: https://openreview.net/forum?id=73a8SuLKvM

---

Title: GICA: The Gap-Index Compositional Arm Framework for Sample-Efficient Test-Time Scaling

Abstract: Test-time scaling (TTS) improves the reasoning capabilities of large language models (LLMs) by generating multiple candidate reasoning paths and using a verifier to select among them. Process reward models (PRMs), which score each intermediate step rather than only the final answer, yield stronger downstream accuracy but at a higher cost. Recently, PRMs that scale at test-time by generating long verification CoTs have been found to be more accurate at verification, but with a prohibitive cost that scales with both the number of paths and their length (number of steps), limiting scalability precisely where TTS is most beneficial. We recast reasoning-based process-level verification as a sample-efficient adaptive selection problem. We propose GICA (Gap-Index Compositional Arm framework), a bandit-based framework that exploits the compositional structure of reasoning paths to share information across related steps and identify the top-$K$ candidates. We establish theoretical correctness and a fixed-confidence sample-complexity bound, and validate GICA through synthetic experiments and in a TTS setup employing an end-to-end TTS pipeline across three mathematical reasoning benchmarks. We experiment with two open-weight math LLMs serving as generators and two LLMs as process-level, reasoning-based verifiers. GICA matches the accuracy of exhaustive process-level verification while substantially reducing verifier calls (by 4.2 $\times$) and inference runtime (by 4.3 $\times$), making fine-grained step-level supervision practical at scale. We open-source our code and data to facilitate future research: https://anonymous.4open.science/r/GICA-1B57.

URL: https://openreview.net/forum?id=zlyn0moogg

---

Title: Modular Diffusion Models for Structured Visual Recognition

Abstract: Traditional supervised methods for structured visual recognition tasks -- such as object detection, segmentation, and scene graph generation -- often produce deterministic, fixed outputs, limiting their ability to capture the inherent uncertainty in complex visual scenes. As a consequence, such point estimates are unable to capture the prediction uncertainty (or multi modality) intrinsic to these problems, often arising from natural ambiguities (e.g., ambiguity in size of partially occluded objects, local ambiguity of exact segmentation boundary, etc.) as well as noise and sparsity of training data. To address this limitation, we present Modular Diffusion Models (MDMs), a simple and novel framework that learns a distribution over structured outputs for a given input image. MDMs decompose the diffusion process into distinct, task-specific modules, each focused on capturing a different aspect of the structured information space, such as object categories, spatial locations, and inter-object relationships. This modular design allows each component to be learned independently, with seamless integration at inference without additional training. Furthermore, the modularity of MDMs enables the diffusion process to easily operate over the heterogeneous output space common in many structured learning tasks (e.g., a continuous bounding boxes and discrete class labels). Experimental results over three distinct structured tasks -- object detection, instance segmentation, and scene graph generation -- highlight the benefits of our proposed framework.

URL: https://openreview.net/forum?id=Na3p1RFEj1

---

Title: SOnET: Towards Open-vocabulary EEG-to-text Decoding via Neural Signature Representation Learning

Abstract: Decoding text from non-invasive brain signals such as electroencephalogram (EEG) has demonstrated potential as a promising brain-computer interface (BCI) application, especially for individuals with communication difficulties. However, key challenges remain in EEG representation learning, particularly differences in individual neural signatures, hindering the development of generalizable EEG-to-text translation systems. We propose SOnET, a novel open-vocabulary EEG-to-text decoding framework that addresses the long-standing challenge of individual neural signature variability by disentangling the neural signatures of the subjects in a split-latent space using a contrastive loss within an autoencoder. Subsequently, the learned signature embeddings from the autoencoder are fused with the EEG embeddings via cross-attention to enable EEG-to-text decoding that accounts for individual neural signature variability. Additionally, we employ a perplexity loss that adaptively scales the gradient updates based on prediction uncertainty during training. Experimental evaluation shows that our proposed model achieves a 15.3% relative improvement in BLEU-1 and a 17.3% relative improvement in ROUGE-1 F1-score over the strongest baseline on three reading tasks from the ZuCo EEG-text dataset. Moreover, on the ROAMM dataset, SOnET yields a 13.6% relative gain in BLEU-1 and a 2.4% improvement in ROUGE-1 F1-score. Detailed ablation highlights the contribution of each component of our proposed model. Our code can be accessed at https://anonymous.4open.science/r/SOnET-C2F8/.

URL: https://openreview.net/forum?id=unNevjbUfK

---

Title: A Survey on Agentic Security: Applications, Threats and Defenses

Abstract: LLM-based agents are now used throughout cybersecurity. While these agents facilitate powerful and autonomous security applications, their autonomy opens up new attack surfaces, and the security community is actively building defenses to secure them. Yet the literature on this subject has grown quickly and unevenly. Existing surveys treat applications, threats, and defenses in isolation, leaving no unified account of how an agent's capabilities, vulnerabilities, and countermeasures interconnect. In this work we present the first holistic survey of the agentic security landscape, structuring the field around the fundamental pillars of Applications, Threats and Defenses. We provide a comprehensive taxonomy of over 260 papers, explaining how agents are used in downstream cybersecurity applications, inherent threats to agentic systems, and countermeasures designed to protect them. In addition, we provide detailed pillar-specific and cross-cutting analyses that show the security-lifecycle coverage of agentic applications, comparison between red-teaming and blue-teaming agents, and the adversarial use of red-teaming applications. On the threat side, we analyze the entry points and agent-loop stages that attacks target, their specificity to the agentic setting, and the threat models they assume. On the defense side, we analyze the prevailing defense strategies, their cost and security trade-offs, and where in the agent lifecycle they are deployed. We further map which defenses cover which attack classes and chart trends in agent architecture, backbone model usage, data modality coverage, and the growth of attack and defense research over time. Taken together, these findings indicate that agentic systems are structurally fragile by default and that securing them will require defenses that span the full agent lifecycle rather than single-layer fixes.

URL: https://openreview.net/forum?id=Od4ZnAUv9T

---

Title: Detecting LLM Memorization through Input Perturbation Analysis

Abstract: Detecting memorization in LLMs is essential for assessing privacy risks, intellectual property exposure, and the reliability of benchmark evaluations. Yet existing detection methods are constrained by two practical limitations. They either require access to the training corpus to
verify verbatim reproduction, or rely on logits that are sometimes unavailable in commercial black-box deployments. We introduce PEARL, a black-box framework that audit memorization essentially based on a model’s input and output behavior. PEARL operationalizes the input perturbation sensitivity hypothesis (PSH): memorized instances occupy narrow attractors in input space and degrade sharply under semantically preserving perturbations, while generalized instances remain stable. Building on this principle, PEARL produces an instance-levelmemorizationscorethroughacalibratedcomparisonbetweentheperturbation neighborhoods of known and unknown samples, requiring neither training data nor logit access. We evaluate PEARL on the Pythia model suite, with sizes ranging from 70M to 2.8B
parameters, and find that its detection performance scales monotonically with model capacity, reaching AUC 0.81 on Pythia-2.8B and substantially outperforming both the gray-box ACR baseline (AUC ≈0.59), the black-box CDD baseline (AUC ≈0.57) and gray-box membership inference upper bound (AUC ≈0.67). We further show that PEARL and membership inference attacks methods are complementary. They agree on fewer than half of their detections, and together identify 80.7% of true members, a 33% relative gain over the strongest individual method. This establishes PEARL as a practical, training-data-free auditing tool that captures generative memorization missed by existing approaches.

URL: https://openreview.net/forum?id=ECCl8HVVfj

---

Title: Higher-order Diffusion Sampling via Chebyshev Interpolation and Gauss–Seidel Iterations

Abstract: Higher-order ODE solvers have shown strong empirical promise for accelerating diffusion models through the probability flow ODE, but rigorous non-asymptotic guarantees for such acceleration remain limited. In this paper, we develop a Chebyshev--Gauss--Seidel higher-order sampler and establish a non-asymptotic convergence guarantee that allows the approximation order to grow logarithmically with the number of outer iterations. In the exact-score setting, up to logarithmic factors, the proposed sampler requires at most
\[
d^{1+o_T(1)}\varepsilon^{-1/K_1}
\]
score functions to approximate the target distribution on $\mathbb{R}^d$ within total variation distance $\varepsilon$, where $o_T(1)\to 0$ as $T\to\infty$ and $K_1>0$ is a sufficiently large constant. The analysis assumes only a polynomial second-moment bound on the target distribution, thereby relaxing the bounded-support condition imposed in existing higher-order theory. Moreover, the guarantee is robust to score and Jacobian estimation errors and does not require higher-order smoothness assumptions on the score estimates. Numerical experiments on anisotropic Gaussian mixture benchmarks support the predicted improvement in the accuracy--cost tradeoff under finite score-evaluation budgets.

URL: https://openreview.net/forum?id=WitVn5rsI2

---

Title: Cluster LOCO: Feature Importance for Interpreting Clusters

Abstract: Clustering is widely used for exploratory analysis and scientific discovery, driving insights from market segmentation to biological data analysis, but its outputs can be difficult to interpret, audit, and reproduce as modern datasets become increasingly large and complex. Reliable use of clustering requires understanding which features drive the discovered structure, yet feature-level explanations for clustering remain scarce compared with methods in supervised learning. Furthermore, existing clustering feature importance scores are often tied to specific algorithms and data assumptions. To address these challenges, we propose Cluster LOCO (Leave-One-Covariate-Out), a family of model-agnostic feature importance scores for clustering. Cluster LOCO is built on feature occlusion and clustering generalizability, defined as whether cluster labels learned on one subset of the data can be accurately predicted on held-out samples. For any chosen clustering algorithm, Cluster LOCO quantifies a feature’s importance by measuring how much its removal degrades generalizability. We first introduce Cluster LOCO-Split, which relies on data splitting, and then extend it to Cluster LOCO-MP, a minipatch ensemble-based version designed for large-scale data. Across synthetic simulations and an application to cell-type discovery in single-cell transcriptomics, we show that Cluster LOCO more reliably recovers informative features than existing clustering feature importance methods.

URL: https://openreview.net/forum?id=QfKlFWtoAW

---

Title: Tackling GNARLy Problems: Graph Neural Algorithmic Reasoning Reimagined through Reinforcement Learning

Abstract: Neural algorithmic reasoning (NAR) is a paradigm that trains neural networks to execute classic algorithms by supervised learning. Despite its successes, important limitations remain: inability to construct valid solutions without post-processing and to reason about multiple correct ones, poor performance on combinatorial NP-hard problems, and inapplicability to problems for which strong algorithms are not yet known. To address these limitations, we reframe the problem of learning algorithm trajectories as a Markov decision process, which imposes structure on the solution construction procedure and unlocks the powerful tools of imitation and reinforcement learning (RL). We propose the GNARL framework, encompassing the methodology to translate problem formulations from NAR to RL and a learning architecture suitable for a wide range of graph-based problems. We achieve very high graph accuracy results on several CLRS-30 problems, performance matching or exceeding much narrower NAR approaches for NP-hard problems and, remarkably, applicability even when lacking an expert algorithm.

URL: https://openreview.net/forum?id=3aeMfeKlKh

---

Title: FAUSAL: Fluctuation Aware Uncertainty Sampling for Ac-tive Learning on Class-Imbalanced Medical Images

Abstract: Medical image classification from imbalanced datasets remains challenging due to labeleddata scarcity and annotation costs, limiting deep learning models. Active learning (AL)addresses this by strategically selecting informative samples for annotation, achieving op-timal performance at minimal cost. However, traditional uncertainty sampling methodsoften overlook minority classes and persistently ambiguous cases both critical for modelperformance. We propose Fluctuation-Aware Uncertainty Sampling for Active Learning(FAUSAL), a sampling strategy extending instantaneous predictive uncertainty by incorpo-rating temporal fluctuations in predictions across AL iterations. FAUSAL measures per-sample prediction instability over multiple model updates, prioritizing samples exhibitingpersistent ambiguity, particularly from underrepresented classes and decision boundary re-gions. We validate FAUSAL on two imbalanced medical imaging benchmarks: HAM-10000(skin lesion classification) and RecoPlasmodium-V1 (malaria development stage recogni-tion). FAUSAL outperforms standard training and state-of-the-art AL baselines, achievingrespectively 92% and 97% of fully supervised performance using only 22.67% and 4.81% ofavailable training data. Notably, minority class recognition substantially improves at lowannotation budgets, demonstrating superior class imbalance handling. Ablation studies con-firm temporal fluctuation’s critical contribution alongside instantaneous uncertainty. Thus,FAUSAL establishes itself as a robust AL strategy for annotation efficient medical imageclassification.

URL: https://openreview.net/forum?id=JGkPKG1JLn

---

Title: Mini-PatchTST: A Lightweight Patch-Based Time-Series Transformer for Metro Passenger Flow Forecasting

Abstract: Efficient urban transportation systems are essential for the quality of life in modern cities. Accurate public transport demand forecasting plays a key role in optimizing system operations and resource allocation. However, traditional approaches often struggle with the complex temporal patterns present in urban transit data. Recent advances in deep learning have demonstrated strong performance in time series forecasting.
This study introduces Mini-PatchTST, a lightweight variant of PatchTST, designed for metro passenger demand forecasting with reduced complexity. To support this work, a novel station-level hourly metro passenger flow dataset was constructed using open data from the Istanbul Metropolitan Municipality. The proposed model is benchmarked against several strong forecasting methods, including deep architectures such as PatchTST and LSTM, as well as machine learning models such as Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost).
Experiments on the Istanbul Metro Dataset show that Mini-PatchTST reduces RMSE by 5–12% against gradient boosting models and by 4% against PatchTST despite its smaller size. Its generalization ability is confirmed on the Hangzhou Metro dataset, where it reduces RMSE by 17–64% over LSTM baselines and outperforms the state-of-the-art TPA-LSTM with up to 22% MAE and over 14% RMSE improvements in transfer and intermediate station settings.
The results demonstrate that Mini-PatchTST offers a robust, scalable, and efficient forecasting solution.

URL: https://openreview.net/forum?id=1zdvxtkcvX

---

Title: Provable Privacy Attacks on Trained Shallow Neural Networks

Abstract: We study what provable privacy attacks can be shown for trained 2-layer ReLU neural networks, focusing on two types of attacks: membership inference and data reconstruction. We prove that theoretical results on the implicit bias of 2-layer neural networks can be used to provably identify with high probability whether a given point was used in the training set in a high-dimensional setting, and can also be used to construct a set of which at least a constant fraction are training points in a univariate setting. To the best of our knowledge, our work is the first to show provable vulnerabilities in this implicit-bias-driven setting.

URL: https://openreview.net/forum?id=6lCkCCw2ds

---

Title: Cross-Modal Knowledge Transfer for Scalable Text-Driven Multimodal Prompt Learning

Abstract: Integrating prompt tuning with multimodal learning enhances generalization across downstream tasks. However, current approaches rely extensively on large-scale modality-specific labeled data (for instance, image, video, or audio) or specialize in single-modality adaptation. And, embeddings produced by encoders trained on different datasets with different architectures reside in incompatible, semantically orthogonal spaces. To address these, we propose a scalable framework, T2n-Modal to invigorate a universal representation model that supports unlimited modalities using only text data. Also, we establish theoretical guarantees with upper bounds on cross-modal transfer under incompatible spaces, and learnable bidirectional projection with orthogonal regularization. Our method integrates three key components, modality prompt pools, text construction mechanisms, and modality-aligned text encoders derived from pre-trained multimodal large models. This framework enables seamless extension to new modalities by augmenting prompt pools and corresponding text encoders. To ensure coherent learning across modalities, T2n-Modal employs intra- and inter-modal learning strategies, which preserve fine-grained category distinctions within modalities while enforcing semantic alignment between them. Leveraging its scalable architecture and pre-trained encoders, T2n-Modal efficiently generalizes to novel modalities without requiring labeled data. Specifically, despite using no modality-specific supervision, our method achieves state-of-the-art performance on diverse benchmarks including image, audio and video classification tasks.

URL: https://openreview.net/forum?id=qBR3Pq9AVr

---

Title: TriSP: Tri-Signal Structured Pruning for Large Language Models

Abstract: Large language models (LLMs) achieve strong performance across diverse tasks but their deployment is constrained by the memory and compute cost of their parameters. Structured pruning addresses this by removing entire structures such as attention heads and Multi-Layer Perceptron (MLP) neurons to produce smaller dense models that run efficiently on standard hardware. However, existing methods rely on either gradient-based importance estimation, which is memory-prohibitive, or activation-based statistical proxies, which do not directly measure the effect of removal on the loss. Furthermore, the interaction between the importance criterion and the post-pruning recovery strategy has not been systematically studied. We propose TriSP (Tri-Signal Structured Pruning), an importance metric that combines weight magnitude scaled by activation norm with first-order gradient sensitivity via a geometric mean, producing a channel-level score that captures both structural and loss-sensitivity signals. Combined with adaptive per-layer budget allocation and low-rank adaptation (LoRA) recovery, TriSP achieves the lowest perplexity and highest zero-shot accuracy across all tested configurations, reaching 6.80 WikiText-2 perplexity at 20\% pruning on LLaMA-7B. Inference throughput improves by 82\% at 50\% pruning, while still maintaining competitive performance.

URL: https://openreview.net/forum?id=QCHKIV5nS2

---

Title: Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions

Abstract: Online reinforcement learning (RL) approaches have demonstrated strong performance on Job Shop Scheduling (JSP) and Flexible JSP (FJSP) problems by learning scheduling policies through direct interaction with simulated environments. However, these methods often require extensive training interactions, limiting their sample efficiency and practical applicability.

Motivated by this challenge, we introduce Conservative Discrete Quantile Actor-Critic (CDQAC), an offline RL algorithm that learns effective scheduling policies directly from static, suboptimal datasets. CDQAC couples a quantile-based critic with delayed policy updates to estimate the return distribution of machine–operation pairs. Extensive experiments on JSP and FJSP benchmarks demonstrate that CDQAC consistently outperforms the data-generating heuristics, surpasses state-of-the-art offline and online RL baselines, and is highly sample efficient, requiring only 1 to 5% of the original dataset to learn high-quality policies. Our analysis suggests that, in scheduling, offline RL performance is governed mainly by state-action coverage rather than the quality of individual trajectories. Scheduling couples a dense reward aligned with the makespan objective with equal-length trajectories across heuristics, enabling effective learning from a broad range of behaviors. Consistent with this observation, datasets generated by a simple random heuristic with broader coverage let it outperform policies trained on datasets produced by stronger heuristics such as Genetic Algorithms.

URL: https://openreview.net/forum?id=4vVhm4817Q

---

Title: Distributionally Robust Bayesian Optimization: From Single to Multiple Objectives

Abstract: In many real-world applications, systems are typically expensive to evaluate and influenced by contextual variables whose distributions may shift between training and deployment. While robust Bayesian optimization methods have been proposed for black-box functions under such conditions, most of them focus solely on single-objective settings. In practice, however, systems often need to be optimized across multiple criteria simultaneously, which is challenging since the same environment may affect different objectives in distinct ways. Although robustness against the contextual uncertainty has been investigated for single-objective problems, its extension to multi-objective optimization (MOO) problems remains limited, with existing works primarily addressing only input noise—a special case of the contextual uncertainty.
To bridge this gap, in this work, we propose the first Multi-objective Bayesian Optimization (MOBO) method for the general $\varphi$-divergence Distributionally Robust Optimization (DRO) problem with shared contexts, aiming to obtain robust efficient solutions. Furthermore, a provable regret bound is provided, which is the first sublinear regret bound without requiring a decreasing radius of the DRO uncertainty set, even in comparison to existing works in the single-objective setting. Moreover, we provide numerical experiments to validate our theory and the empirical effectiveness of our proposed algorithms.

URL: https://openreview.net/forum?id=RyIikTYJBd

---

Title: Automated Discovery of Actual Cause Judgments in Complex Environments

Abstract: Many problems in artificial intelligence, such as explanation, blame attribution, and assessment of harm, can be formally represented within the framework of actual causality, which describes the causation of particular observed events that have already occurred. However, existing definitions of actual cause often identify many events as actual causes in large, complex environments, even though many of them rarely influence the outcome. By contrast, humans are able to make highly specific causal judgments about a few specific causes for a particular outcome. Prior work in both actual causality and psychology has characterized this as an issue of normality, which assigns actual cause based on the "normative" or "common" outcomes. However, current approaches, which often require defining hierarchies of normality, do not scale well to complex and continuous-valued environments. This paper introduces invariant-set actual-cause judgments (IAJ), a novel framework that formalizes established intuitions of normality by using context-specific independencies to identify the set of actual cause judgments: a subset of causal variables that are characteristic of producing the outcome. Inspired by actual cause, IAJ captures intuitions about the key variables that describe why an outcome occurs. We then present joint optimization for minimum actual cause judgment (JOA), a practical algorithm that learns from observational data to infer approximated IAJs. We show empirically that JOA identifies actual cause judgments with significantly higher accuracy than existing methods across a set of complex, continuous-valued environments.

URL: https://openreview.net/forum?id=AYtdGwuDci

---

Title: Routing Beats Latency

Abstract: The deployment of deep neural networks that operate under hard latency constraints has to compromise accuracy against speed, especially where the input difficulty is widely varying in practice. Although the high-capacity Transformers like DeiT-S-16 have good robustness, the inference cost cannot be deployed to the edges as it is prohibitive. Lightweight CNNs such as SqueezeNet on the other hand gain high throughput but in severe blur and noisy behaviour they struggle for accuracy. Our phase-based study indicates that heuristic measures of complexity such as entropy-based difficulty predictor ($R^2 = 0.112$) and complexity scores on CIFAR-10 derived by ICNet-9600 are not useful routing cues because of low variance and poor predictive structure.

To address these shortcomings, we propose a framework of Adaptive Mixture of Experts (MoE) that achieves train dense route sparse inference with a lightweight SqueezeNet-Light Gater. The Gater works at downsampled inputs $96 \times 96$ and launches a Gumbel-Softmax straight-through estimate which has only 0.82 ms overhead, and just a latency-aware composite loss with a trade-off parameter to be traded at (referred to as $\lambda$) allows the system to jointly optimise the accuracy and predicted inference cost.

Experiments of Intel Image Classification data (augmented with controlled Gaussian blur and noise) indicate that the system finds a dynamic Pareto frontier. When the router is configured to use $\lambda=0.7$, the router puts 98\% of clean images in the fastest expert, and switches to 90\% DeiT routing with harsh corruption, reflecting human interpretable patterns of difficulty. It reaches 82.71\% accuracy at an average of 5.39 ms latency - which is a speedup of approximately $\sim29\%$ over DeiT-S-16 (7.56 ms) with only a small drop in robustness across different levels of corruption. We have shown, using our results, that latency-aware differentiable routing is both more efficient and adaptive than both static CNNs and Transformers in terms of accuracy-latency Pareto frontiers.

URL: https://openreview.net/forum?id=ACUpdwBIxi

---

Title: Residual-Controlled Multiplier Learning for Stochastic Constrained Decision-Making

Abstract: Stochastic constrained decision-making requires optimizing performance objectives while enforcing statistical requirements such as safety or fairness. However, standard primal--dual methods struggle to update multipliers robustly under stochastic mini-batch feedback, as the noise of mini-batch gradients and constraint estimates can be directly accumulated into the multiplier memory. To address this issue, we propose Residual-Controlled Multiplier Learning (RCML), which reformulates multiplier updating as projected-pressure feedback. The central idea is to decompose the projected multiplier into an effective pressure signal for primal descent and a pressure-memory residual for finite-gain multiplier tracking. To handle heterogeneous and noisy observations, we further augment this residual-integral backbone with modular stochastic stabilization components. For the convex-affine backbone, we establish finite-gain convergence, derive a stochastic residual bound under mini-batch feedback, and show that the residual feedback law admits a local KKT-residual interpretation near regular KKT points of nonconvex problems. Experiments across optimization, allocation, and fair-ranking tasks show that RCML improves feasibility control and multiplier stability while maintaining competitive objective performance. Code is released at \url{https://anonymous.4open.science/r/RCML-3114/}.

URL: https://openreview.net/forum?id=v3bj7jXaxJ

---

Title: Attention-Based Feature Online Conformal Prediction for Time Series

Abstract: Online conformal prediction (OCP) wraps around any pre-trained predictor to produce prediction sets with coverage guarantees that hold irrespective of temporal dependencies or distribution shifts. However, standard OCP faces two key limitations: it operates in the output space using simple nonconformity (NC) scores, and it treats all historical observations uniformly when estimating quantiles. This paper introduces attention-based feature OCP (AFOCP), which addresses both limitations through two key innovations. First, AFOCP operates in the feature space of pre-trained neural networks, leveraging learned representations to construct more compact prediction sets by concentrating on task-relevant information while suppressing nuisance variation. Second, AFOCP incorporates a multi-head attention mechanism that adaptively weights historical observations based on their relevance to the current test point, effectively handling non-stationarity and distribution shifts. We provide theoretical guarantees showing that AFOCP maintains long-term coverage while achieving smaller long-term time-averaged prediction sets than standard OCP under mild regularity conditions. Extensive experiments on synthetic and real-world time series datasets demonstrate that AFOCP consistently reduces the prediction interval lengths by as much as $88\%$ relative to OCP and yields shorter intervals than the online counterparts of representative offline CP designs for time series, while maintaining target coverage levels, validating the benefits of both feature-space calibration and attention-based adaptive weighting.

URL: https://openreview.net/forum?id=oSX8XCekQN

---

Title: Cross-Lingual Speech Emotion Recognition with Self-Supervised Models: A Confound-Controlled Comparison

Abstract: Speech emotion recognition across languages remains difficult because emotional cues interact with speaker identity, language, and recording conditions. emotion2vec, an emotion-specialized self-supervised learning (SSL) speech model, reports large gains over general-purpose SSL encoders such as HuBERT and WavLM across multiple non-English languages. This paper re-examines that claim under a confound-controlled cross-lingual evaluation. We compare HuBERT, WavLM, and emotion2vec on five emotional speech corpora spanning German, English, Mandarin, and Bangla, with an additional external test on Thai. Across speaker-independent probing, matched per-dataset evaluation, non-EmoBox generalization, and zero-shot/few-shot cross-lingual transfer, the general-purpose encoders consistently match or outperform emotion2vec. In speaker-independent four-class evaluation, emotion2vec ranks last on all five corpora, with significant gaps on four. Under cross-lingual transfer across 12 source-target language pairs, emotion2vec trails general-purpose SSL models by 12 to 18 percentage points in zero-shot transfer and remains about 10 points behind in the few-shot setting with 100 target-language examples per source-target pair. On Thai, a language outside the EmoBox fine-tuning distribution, the fine-tuned emotion2vec variant performs worse than both general SSL models and its own non-fine-tuned base version. We further show that the gap is not explained by speaker identity: after iterative null-space projection removes speaker-discriminative directions, HuBERT and WavLM remain ahead. These results suggest that emotion2vec, the emotion-specialized model whose published cross-lingual per-language results we re-examine, does not transfer across languages as reliably as general-purpose SSL. In practice, HuBERT and WavLM remain strong defaults, reaching 0.78 to 0.93 in-language accuracy on the four-class task and about 0.77 cross-lingual accuracy in the few-shot setting with only 100 target-language labels per source-target pair. Code is available at https://anonymous.4open.science/r/ser-cross-ling-pub-536B

URL: https://openreview.net/forum?id=RBUnKFHagf

---

Title: Language Modeling with Hyperspherical Flows

Abstract: Discrete Diffusion Language Models progressed rapidly as an alternative to autoregressive (AR) models, motivated by their parallel generation abilities. However, for tractability, discrete diffusion models sample from a factorized distribution, which is less expressive than AR. Recent Flow Language Models (FLMs) apply continuous flows to language, transporting noise to data with a deterministic ODE that avoids factorized sampling.
FLMs operate on one-hot vectors whose dimension scales with the vocabulary size, making FLMs costly to train. Moreover, since all distinct one-hot embeddings are equidistant in $\ell_2$, adding Gaussian noise does not have a clear semantic interpretation (unlike images, where Gaussian noise progressively degrades structure).
We introduce $\mathbb{S}$-FLM, a latent FLM in the hypersphere. $\mathbb{S}$-FLM generates sequences by rotating vectors in $\mathbb{S}^{d-1}$ along a velocity field learned with cross-entropy, avoiding the overhead of materializing one-hot vectors.
Previous FLMs match AR in Generative Perplexity (Gen. PPL), but samples with high likelihood are not necessarily correct in verifiable domains such as math and code. $\mathbb{S}$-FLM substantially improves continuous flow language models in large-vocabulary reasoning, increasing the accuracy on GSM8K from less than 1% for prior continuous FLMs to 12–18% depending on decoding. $\mathbb{S}$-FLM matches masked diffusion (such as MDLM and Duo) under standard-temperature sampling ($T=1$), while a gap remains under optimized low-temperature ($T=0.1$) decoding.

URL: https://openreview.net/forum?id=jBh2WNssnL

---

Title: RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

Abstract: Standard reinforcement learning (RL) for large language model (LLM) agents primarily optimizes extrinsic task rewards, often favoring isolated task completion over continual adaptation. This paradigm can cause premature convergence to suboptimal policies and leaves useful experience only implicitly encoded in model parameters, limiting its retrieval and reuse for future decisions. We introduce RetroAgent, an online RL framework that trains agents to master interactive environments not merely by solving tasks, but by evolving across episodes. Inspired by human retrospective self-improvement, RetroAgent augments extrinsic rewards with hindsight-generated dual intrinsic feedback: (1) Intrinsic Numerical Feedback, which rewards beneficial exploration by measuring incremental subtask progress relative to prior attempts; and (2) Intrinsic Language Feedback, which distills successes and failures into reusable textual lessons for explicit experience reuse. To leverage these lessons effectively, we propose Similarity & Utility-Aware Upper Confidence Bound (SimUtil-UCB), a retrieval strategy that balances semantic relevance, historical utility, and exploration. Across four challenging agentic benchmarks, RetroAgent achieves new state-of-the-art performance, outperforming GRPO by +18.3% on ALFWorld, +15.4% on WebShop, +27.1% on Sokoban, and +8.9% on MineSweeper, while demonstrating strong test-time adaptation and out-of-distribution generalization.

URL: https://openreview.net/forum?id=tzxP9H3f3O

---

Title: Failing the Test of Time: Do Text and Time-Series Benchmarks Measure Time-Series Understanding?

Abstract: Recent advancements in foundation models, particularly Large Language Models (LLMs), has provided new opportunities for time-series analysis, where languages are natural media of parallel information accompanying time-series---substantial pioneering research has recently begun to jointly model time-series and text. While the recent works are of fundamental importance, we observed a somewhat surprising phenomenon: in the widely used benchmarks, the evaluated LLMs can often achieve high accuracy, without actually consulting the time-series, raising essential concerns on whether the existing benchmarks for joint time-series and language analysis actually measure the intended model capabilities. We provide comprehensive studies on a sphere of widely deployed LLMs, across publicly available benchmarks, and under different input conditions with original input, masked conditions, noisy corruption, and modified time-series values. We leverage Bloom's-taxonomy analysis to reveal the relationship with benchmark complexity. Our study carried on complementary analyses provides comprehensive evidence and insights on the potential misalignment between the existing benchmarks and their objectives. We hope the research helps redirect the research to focus on more rigorous assessment.

URL: https://openreview.net/forum?id=kgrslrMVr1

---

Title: Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

Abstract: Large Language Models (LLMs) increasingly exhibit strong reasoning abilities, often attributed to their capacity to generate chain-of-thought-style intermediate reasoning.
Recent work suggests that exposure to code can further enhance these skills, but existing studies largely treat code as a generic training signal, leaving open the question of which properties of code actually contribute to improved reasoning.
To address this gap, we study the structural complexity of code, which captures control flow and compositional structure that may shape how models internalise multi-step reasoning during fine-tuning.
We examine two complementary settings: solution-driven complexity, where structural complexity varies across multiple solutions to the same problem, and problem-driven complexity, where structural complexity reflects variation in the underlying tasks.
Using cyclomatic complexity and logical lines of code to construct controlled fine-tuning datasets, we evaluate a range of open-weight LLMs on diverse reasoning benchmarks.
Our findings show that although code can improve reasoning, its usefulness is substantially shaped by structural properties.
In 83% of experiments, restricting fine-tuning data to a specific structural complexity range outperforms training on structurally diverse code, pointing to a data-centric path for improving reasoning beyond scaling.

URL: https://openreview.net/forum?id=HCz4ROIhUL

---

Title: On Measuring Localization of Shortcuts in Deep Networks

Abstract: Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks (Geirhos et al., 2020). To design principled shortcut-mitigation methods, it is crucial to understand how shortcuts affect feature representations. In this work, we investigate the layer-wise localization of shortcuts in deep models. We propose a novel experiment design that quantifies the layer-wise contribution to accuracy degradation caused by a shortcut. Our method introduces a shortcut-inducing data skew into the training process and counterfactually compares training on clean and skewed datasets using suitable shortcut-learning metrics. We employ our method to study vision classification shortcuts across the CIFAR-10, Waterbirds, and CelebA datasets and the VGG, ResNet, DeiT, and ConvNeXt architectures. We find that shortcut learning is not localized in specific layers but distributed throughout the network. Different network parts play different roles in this process: earlier layers predominantly encode spurious features, while later layers predominantly forget core features (i.e., features that are predictive on clean data). We analyze the differences in localization and describe their principal axes of variation. Finally, we investigate layer-wise training interventions and find that our localization metrics are predictive of their success.

URL: https://openreview.net/forum?id=53nSsrMKU4

---

Title: Evaluating the evidence trace of NeurIPS 2025 contributions with agentic code auditing

Abstract: Code serves as the primary evidence behind computational publications, yet a detailed review of an unfamiliar codebase imposes a commonly prohibitive time burden on volunteer reviewers. Consequently, self-reported reproducibility checklists at major machine learning venues face little empirical verification, leaving code as a significant blind spot in peer review. To bridge this gap, we introduce AuditOwl, an autonomous, verification-centric LLM pipeline designed to make code auditing feasible for authors pre-submission and reviewers post-submission. With this framework, we conduct an audit of 100 randomly sampled empirical papers from the NeurIPS 2025 main track. For each paper, an LLM agent inspects the repository and evaluates its scientific claims against the evidence in the underlying code. Following an independent adversarial verification pass to maximize precision, the system raises 605 discrepancies (averaging 6.1 per paper). It forms a graded evidence trail: findings are quote-anchored, cite specific code locations, and about half are backed by executable verification checks that the agent implements. On a manually validated subset of findings, we measure an error rate of 6.2%. Our approach reveals a steep reproducibility funnel: only 87% of papers in our sample release code at all, and for just 9% all analyzed published findings trace cleanly to the repository. Discrepancies we find are heavily dominated by incompleteness of code and mismatches between what the paper describes and what the code does. We also detect a prevalence of technical bugs and serious methodological issues. Operating at reasonable cost, agentic code auditing can augment human peer review and help to make computational science more reproducible. Code available at: https://github.com/anonym0wl/AuditOwl.

URL: https://openreview.net/forum?id=xjCJY8Xxcm

---

Title: When Do Causal Fairness Constraints Work? Reproducing and Stress-Testing Long-Term Fair Reinforcement Learning

Abstract: We study the reproducibility of A Causal Lens for Learning Long-Term Fair Policies by Lear & Zhang (2025), which introduces qualification gain disparity (QGD) as a long-term fairness objective in sequential decision-making and proposes causality-aware PPO variants (PPO-C and PPO-Cb) to reduce it. Building on the authors' official implementation, we replicate their core experiments in a bank-lending task and test whether the reported disparity reductions, causal decomposition trends, and utility–fairness trade-offs hold. Our results largely confirm the original findings: PPO-C and PPO-Cb consistently reduce QGD relative to standard PPO and fairness-aware baselines, with the causal decomposition suggesting that these reductions mainly come from making the learned policy's direct treatment of groups more similar, rather than from changes in the environment's transition dynamics. However, we find that utility preservation is weaker than originally reported in some settings. We further extend the evaluation along three axes: strongly imbalanced population ratios, a K-group extension (where K > 2) based on Qualification Gain Variance (QGV), and a structurally different infectious-disease environment. These extensions show that the K-group objective is highly sensitive to the fairness coefficient: untuned penalties can collapse utility, while moderate values recover useful trade-offs. We also show that group-level causal decomposition remains diagnostically useful, with reductions in QGV arising mainly through the direct policy component while structural sources of disparity are offset by indirect dynamics rather than eliminated. Overall, we support most of the original claims while clarifying when causal long-term fairness objectives remain effective and stable.

URL: https://openreview.net/forum?id=CJhKrSFvVQ

---

Title: Priors in LLM Routing: From Initialization to Regularization

Abstract: LLM routing is commonly treated as a data-driven utility estimation problem, where routing decisions are learned primarily from historical interactions. In this work, we argue that such a likelihood-dominant view is fundamentally incomplete. The key challenge in LLM routing is not only how to fit utility from data, but also how to make stable decisions when available evidence is insufficient or unreliable. In these regimes, prior information is indispensable: it initializes utility estimation when observations are scarce, and regularizes it when feedback is noisy or incomplete. Based on this insight, we propose BayesRouter, a lightweight posterior utility inference framework that integrates semantic priors from standardized model descriptions with empirical evidence from retrieved historical interactions via closed-form Bayesian fusion. Experiments on multiple public benchmarks demonstrate that BayesRouter achieves a better accuracy–cost trade-off than strong routing baselines, improves performance in low-data settings, and exhibits stronger robustness under noisy and missing supervision. Our results show that priors are not merely auxiliary in LLM routing, but a key ingredient for stable utility estimation.

URL: https://openreview.net/forum?id=K3xNTJOM1j

---

Title: Repairing Reward Functions with Feedback to Mitigate Reward Hacking

Abstract: Human-designed reward functions for reinforcement learning (RL) agents are frequently misaligned with the humans' true, unobservable objectives, and thus act only as proxies. Optimizing for a misspecified proxy reward function often induces reward hacking, resulting in a policy misaligned with the human's true objectives. An alternative is to perform RL from human feedback, which involves learning a reward function from scratch by collecting human preferences over pairs of trajectories. However, building such datasets is costly. To address the limitations of both approaches, we propose Preference-Based Reward Repair (PBRR): an automated iterative framework that repairs a human-specified proxy reward function by learning an additive, transition-dependent correction term from preferences. A manually specified reward function can yield policies that are highly suboptimal under the ground-truth objective, yet corrections on only a few transitions may suffice to recover optimal performance. To identify and correct for those transitions, PBRR uses a targeted exploration strategy and a new preference-learning objective. We prove in tabular domains PBRR has a cumulative regret that matches, up to constants, that of prior preference-based RL methods. In addition, on a suite of reward-hacking benchmarks, PBRR consistently outperforms baselines that learn a reward function from scratch from preferences or modify the proxy reward function using other approaches, requiring substantially fewer preferences to learn high performing policies.

URL: https://openreview.net/forum?id=zs15WK9rn5

---

Title: Conformal Candidate Certification for Offline Model-Based Optimization

Abstract: Offline model-based optimization (MBO) proposes candidates by optimizing
a surrogate trained on a fixed historical dataset.
Because candidates are deliberately out-of-distribution, surrogate rankings are
least reliable exactly where the optimizer is most aggressive.
Existing methods address this issue by regularizing the surrogate or
proposal mechanism, but they do not provide a per-candidate statistical
certificate that a proposed design meets a user-specified performance threshold.
We propose \emph{Conformal Candidate Certification} (CCC), a post-hoc wrapper that
attaches a calibrated one-sided lower bound to each candidate and
advances only those whose bound exceeds the target.
The key challenge is covariate shift: calibration data follow the
historical distribution, while candidates follow the proposal distribution.
We show that entropy-regularized surrogate maximization induces a
\emph{Gibbs-tilted proposal distribution}, allowing the same surrogate
that drives optimization to supply importance weights for weighted
conformal prediction, without a separate density-ratio estimation step.
Under oracle weights and strict data splitting, CCC satisfies
finite-sample marginal lower-bound validity.
Experiments on a synthetic stress test and the superconductivity dataset of (Hamidieh, 2018)
($n=17{,}011$ compounds) show selective certification with
empirical coverage at or above the nominal level, a $9.5$\,K gain in
mean certified critical temperature over the naive surrogate rule, and
a reduction in the false-acceptance rate by more than half.

URL: https://openreview.net/forum?id=bEkbRn9BcZ

---

Title: Reproducing "The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models"

Abstract: The widespread usage of large language models (LLMs) has raised concerns about both fairness and privacy. However, prior methods that improve fairness in LLMs often weaken privacy, while those that enhance privacy can reduce fairness. The paper "The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models" by Qian et al. (2025) introduces SPIN as a training-free method that suppresses fairness-privacy coupled neurons to mitigate this trade-off. In this work, we reproduce and evaluate the results of Qian et al. (2025) across three instruction-tuned LLMs. Consistent with the original paper, we observe that SPIN generally improves both fairness and privacy while preserving general capabilities and remaining effective under malicious and limited-data settings. We extend the original study by evaluating SPIN under more data-scarce conditions, testing a new neuron localization method, and examining SPIN on the safety-helpfulness trade-off, finding mixed generalization across models. Our results support the effectiveness of SPIN under various conditions, but indicate that its transfer to other alignment trade-offs is inconsistent, suggesting that coupled-neuron behaviour is context-dependent and highlighting the need to better understand internal representations in LLMs.

URL: https://openreview.net/forum?id=7nKvpj17oI

---

Title: AIRS-Bench: A Suite of Tasks for Frontier AI Research Science Agents

Abstract: LLM agents hold significant promise for advancing scientific research. To accelerate this progress, we introduce AIRS-Bench (the AI Research Science Benchmark), a suite of 20 tasks sourced from state-of-the-art machine learning papers. These tasks span diverse domains, including language modeling, mathematics, bioinformatics, and time series forecasting. AIRS-Bench tasks assess agentic capabilities over the full research lifecycle—including idea generation, experiment analysis and iterative refinement—without providing baseline code. The AIRS-Bench task format is versatile, enabling easy integration of new tasks and rigorous comparison across different agentic frameworks. We establish baselines using frontier models paired with both sequential and parallel scaffolds. Our results show that agents exceed human SOTA in seven tasks but fail to match it in thirteen others. Even when agents surpass human benchmarks, they do not reach the theoretical performance ceiling for the underlying tasks. These findings indicate that AIRS-Bench is far from saturated and offers substantial room for improvement. We open-source the AIRS-Bench task definitions and evaluation code to catalyze further development in autonomous scientific research.

URL: https://openreview.net/forum?id=ELg8kRK9Wd

---

Title: DiffStyleTS: Diffusion Model for Style Transfer in Time Series

Abstract: Style transfer combines the content of one signal with the style of another. It supports applications such as data augmentation and scenario simulation, helping machine learning models generalize in data-scarce domains. While style transfer is well developed in vision and language, methods for time series remain limited. We introduce DiffStyleTS, a diffusion-based framework for univariate time series style transfer. DiffStyleTS separates a time series into content and style representations with complementary convolutional encoders and recombines them through an attention-based denoising diffusion process trained without paired style-transfer examples. At inference, the encoders extract content and style from two distinct series, enabling conditional generation of diverse samples that preserve the trajectory of one input while adopting the local dynamics of the other. We demonstrate qualitatively and quantitatively that DiffStyleTS achieves effective style transfer across multiple time-series domains. We further validate its real-world utility by showing that DiffStyleTS-based augmentation improves anomaly detection in a low-data chemical-process setting.

URL: https://openreview.net/forum?id=Segv4nJuZb

---

Title: GRAIL: A Benchmark for GRaph ActIve Learning on Longitudinal Sensing Data

Abstract: Modern embedded and wearable sensing systems continuously collect high-dimensional data for health and behavior monitoring, yet obtaining high-quality labels remains costly and resource-intensive. Graph-based Active Learning (Graph-AL) offers an efficient solution by exploiting network relationships among sensors or users to prioritize the most informative data for labeling, reducing energy use and user burden. However, existing Graph-AL evaluations rely on one-shot experiments over static citation-style graphs and overlook the temporal
variability, repeated-querying dynamics, and per-user sampling burden that characterize real-world longitudinal sensing deployments. To bridge this gap, we introduce GRAIL, a benchmarking framework that evaluates Graph-AL strategies on longitudinal sensing data,
where node features and labels evolve daily on a fixed underlying social graph. GRAIL is architected to also handle evolving graph topologies, but our two contributed real-world datasets (SNAPSHOT and Friends-and-Family) exercise the longitudinal-features, static-topology setting due to the scarcity of public datasets with both rich sensing features and evolving social ties. GRAIL introduces metrics to assess sustained effectiveness, sampling diversity, and per-user query burden, allowing systematic comparison of AL methods under
realistic conditions. Experiments across 11 AL strategies on real-world data reveal clear trade-offs between predictive performance and sensing cost, identify hybrid strategies (AGE, GraphPartFar) as Pareto-optimal across two markedly different topologies, and quantify the
conditions under which classical Graph-AL methods fail to outperform random sampling.

URL: https://openreview.net/forum?id=Jx0AABdq6n

---

Title: Can Safety Emerge from Weak Supervision? A Systematic Analysis of Small Language Models

Abstract: Safety alignment is critical for deploying large language models (LLMs) in real-world applications, yet most existing approaches rely on large human-annotated datasets and static red-teaming benchmarks that are costly, difficult to scale, and slow to adapt to evolving model behaviors. Moreover, overly conservative safety mechanisms can reduce model usefulness by rejecting sensitive but legitimate queries. We introduce Self-MOA (Self Multi-Objective Alignment), a fully automated framework for aligning small language models using weak supervision from automated evaluator models. Self-MOA operates as a closed loop that dynamically generates model-specific red team prompts, constructs preference data from model-generated responses, and aligns models via multi-objective preference optimization to jointly optimize for safety and helpfulness. Across multiple small language models and safety benchmarks, Self-MOA achieves a 12.41% improvement in safety while preserving helpfulness, using as little as 11 times less training data than human-supervised alignment baselines. These results demonstrate that adaptive, automated alignment can reduce the dependence on static, human-curated safety pipelines in resource constrained settings.

URL: https://openreview.net/forum?id=T3ytSTxbG1

---

Title: A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport

Abstract: State-of-the-art end-to-end (E2E) ASR systems, such as the Connectionist Temporal Classification (CTC) and transducer-based models, suffer from peaky behavior and alignment inaccuracies.
In this paper, we propose a novel differentiable alignment framework based on one-dimensional optimal transport, enabling the model to learn a single alignment and perform ASR in an E2E manner.
We introduce a pseudo-metric, called Sequence Optimal Transport Distance (SOTD), over the sequence space and discuss its theoretical properties.
Based on the SOTD, we propose Optimal Temporal Transport Classification (OTTC) loss for ASR and contrast its behavior with CTC.
Experimental results on the TIMIT, AMI, and LibriSpeech datasets show that our method considerably improves alignment performance compared to CTC and the more recently proposed Consistency-Regularized CTC, though with a trade-off in ASR performance.
We believe this work opens new avenues for seq2seq alignment research, providing a solid foundation for further exploration and development within the community.

URL: https://openreview.net/forum?id=IgfxD1xBFB

---

Title: A Bi-Objective Framework for Scene Text Super-Resolution: Balancing Legibility and Fidelity

Abstract: Image super-resolution (SR) can improve many vision systems—from surveillance and autonomy to document analysis and retail analytics—because recovering high‑frequency details, especially scene text, enables reliable downstream perception. Scene text, i.e., text embedded in natural images such as signs, product labels, and storefronts, often carries task-critical information; when characters are blurred or hallucinated, optical character recognition (OCR) and subsequent decisions can fail even if the rest of the image appears sharp. Yet previous SR research has often been tuned to distortion metrics such as PSNR/SSIM or perceptual and image-quality metrics such as LPIPS, MANIQA, CLIP-IQA, and MUSIQ, none of which directly measure character-level correctness. Furthermore, studies that do address text SR often rely on constrained text-centric benchmarks, leaving text restoration in complex natural scenes underexplored. As a result, scene text is effectively treated as generic texture. For SR to be effective in practical deployments, it is therefore essential to explicitly optimize for both text legibility and perceptual quality. We present GLYPH‑SR, a vision–language‑guided diffusion framework that aims to achieve both objectives jointly. GLYPH-SR uses a Text-SR Fusion ControlNet (TS-ControlNet) guided by OCR-derived text and layout cues, together with a ping--pong scheduler that alternates between text-centric and scene-centric guidance. To enable targeted text restoration, we train TS-ControlNet on a synthetic corpus while keeping the main SR branch frozen. Across SVT, SCUT‑CTW1500, and CUTE80 at $\times4$ and $\times8$, GLYPH‑SR improves OCR $\mathrm{F}_1$ by up to 15.18 percentage points over diffusion/GAN baselines (SVT ×8, OpenOCR) while maintaining competitive MANIQA, CLIP‑IQA, and MUSIQ. GLYPH-SR is designed to better balance the two objectives—text readability and visual realism—producing outputs that are both perceptually plausible and more text-legible. We provide code, pretrained models, the synthetic corpus with generation scripts, and an evaluation suite to support reproducibility.

URL: https://openreview.net/forum?id=9CnYnByhvB

---

Title: Optimal Transport Feature Alignment for Cross-Domain Cell Image Classification

Abstract: Classifying single-cell images across datasets collected under varying imaging conditions remains a fundamental challenge in computational cytology. Differences in microscope hardware, illumination, staining, and media introduce substantial domain shifts, leading to poor generalization of models trained on a single dataset—even when underlying cell types are shared. To address this, we propose a robust pipeline that integrates (i) multi-encoder feature extraction using foundation models, (ii) unsupervised identification of shared cell types between datasets, (iii) alignment of embedding spaces using graph-regularized optimal transport (OT), and (iv) final classification via ensemble-based voting across encoders. By explicitly filtering unmatched classes and transporting only comparable subsets, our method mitigates the impact of non-overlapping cell types and structural misalignment. Experimental results across multiple real-world cell image datasets demonstrate improved accuracy (+20% in average over all datasets considered in the paper when adding class filtering and OT), robustness to domain shift, and reliable performance in both fully unsupervised and low-label regimes—without requiring fine-tuning or retraining on the target domain.

URL: https://openreview.net/forum?id=pCNMqbXqlU

---

Title: Mathematical Theory of Collinearity Effects on Machine Learning Variable Importance Measures

Abstract: In many machine learning problems, understanding variable importance is a central concern. Two common approaches are Permute-and-Predict (PaP), which randomly permutes a feature in a validation set, and Leave-One-Covariate-Out (LOCO), which retrains models after permuting a training feature. Both methods deem a variable important if predictions with the original data substantially outperform those with permutations. In linear regression, empirical studies have linked PaP to regression coefficients and LOCO to $t$-statistics, but a formal theory has been lacking. We derive closed-form expressions for both measures, expressed using square-root transformations. PaP is shown to be proportional to the coefficient and predictor variability: $\text{PaP}_i = |\hat{\beta}_i| \sqrt{2\operatorname{Var}(\mathbf{x}^v_i)}$, while LOCO is proportional to the coefficient but dampened by collinearity (captured by $\Delta$): $\text{LOCO}_i = |\hat{\beta}_i| (1 -\Delta)\sqrt{1 + c}$. These derivations explain why PaP is largely unaffected by multicollinearity, whereas LOCO is highly sensitive to it. Monte Carlo simulations confirm these findings across varying levels of collinearity. Although derived for linear regression, we also show that these results provide reasonable approximations for models like Random Forests. Overall, this work establishes a theoretical basis for two widely used importance measures, helping analysts understand how they are affected by the true coefficients, dimension, and covariance structure. This work bridges empirical evidence and theory, enhancing the interpretability and application of variable importance measures.

URL: https://openreview.net/forum?id=rdCPIjY3Dr

---

Title: Controllable Attribute-Guided Image Generation with Causal Modeling

Abstract: Attribute-guided generative models enable explicit control over image content through labeled attributes. However, they often struggle to disentangle individual attributes and mitigate undesired correlations among them. For example, adding eyeglasses may unintentionally alter a person's perceived age, as eyeglasses are often correlated with older individuals in the training data. In this work, we propose a novel attribute-guided generative framework designed to address these challenges. Our method learns a mask-based representation for each attribute label, encouraging disentanglement by limiting each attribute’s influence to a small subset of representation dimensions while preserving the information necessary to represent the corresponding label. For attributes that exhibit inherent dependencies, we further introduce a causal conditioning strategy that explicitly models their causal relationships, enabling more faithful and controllable attribute manipulation. Extensive experiments on a wide range of datasets demonstrate the effectiveness of our framework in enhancing attribute-level controllability.

URL: https://openreview.net/forum?id=S3VzsH9VGs

---

Title: PRIVET: PRoximIty leakage detection Via Extreme value Theory

Abstract: Deep generative models are often trained on sensitive data, such as genetic sequences, health data, or more broadly, any copyrighted, licensed or protected content. This raises critical concerns around privacy-preserving synthetic data, and more specifically around privacy leakage, an issue closely tied to overfitting. Existing proximity-based methods almost exclusively rely on global criteria to estimate the risk of privacy failure associated to a model, offering only quantitative non interpretable insights. The absence of rigorous evaluation methods for data privacy at the sample-level may hinder the practical deployment of synthetic data in real-world applications. Using extreme value statistics on nearest-neighbor distances, we propose PRIVET, a generic sample-based, modality-agnostic algorithm that assigns an individual privacy leak score to each synthetic sample. We empirically demonstrate that PRIVET reliably detects instances of memorization and more subtle forms of privacy leakage across diverse data modalities, including settings with very high dimensionality, limited sample sizes such as genetic data and even under underfitting regimes. We compare our method to existing approaches under controlled settings and show its advantage in providing both dataset level and sample level assessments through qualitative and quantitative outputs. Additionally, our analysis reveals limitations in existing computer vision embeddings to yield perceptually meaningful distances when identifying near-duplicate samples.

URL: https://openreview.net/forum?id=j2JKuv13xh

---

Title: Revisiting RegressionMSR: A Reproducibility Study

Abstract: Regression-adjusted Monte Carlo estimators for Shapley values and probabilistic values combine surrogate modeling with maximum-sample-reuse estimation to reduce the cost of feature-attribution computation. We present a reproducibility study of RegressionMSR in the interventional feature-attribution setting, focusing on artifact fidelity, practical algorithmic choices, and empirical robustness. We first identify implementation-level differences between the estimator displayed in the paper and the released artifact: the code computes separated conditional means over coalitions containing and excluding each feature, rather than the displayed all-row signed residual average, and in the large-$n$ with-replacement sampler, it stores a size-level denominator rather than the paper's full coalition-level sampling probability. We characterize these conventions algebraically and verify them with synthetic diagnostics. Holding the released denominator fixed, the all-row versus separated-means discrepancy is measurable but has limited effect on final attribution error in our tested settings. We then test the authors' practical single-surrogate simplification, finding that it substantially reduces runtime while maintaining and often slightly improving accuracy. Next, we qualitatively reproduce the headline Shapley benchmark: TreeMSR remains the lowest-error estimator on average, although rank-stability metrics give a complementary ranking. Finally, we extend the original Gaussian-noise experiment to Laplacian and coalition-size-dependent heteroskedastic noise, finding that TreeMSR's low-noise advantage persists across other low-noise regimes. Overall, our results support RegressionMSR's empirical value while clarifying implementation details needed for reproducible use.

URL: https://openreview.net/forum?id=AzOk6sdMRZ

---

Title: How Now Oblong Cow: the importance of anisotropy in benchmarking intrinsic dimension estimators

Abstract: The manifold hypothesis suggests that data lies on manifolds with smaller intrinsic dimension (ID) than their ambient dimension. However there is no empirical agreement on the estimates for ID from different estimators for realistic datasets. Thus it is important to test ID estimation methods with targeted stressors. In this work, we consider the role of anisotropy. To this end, we propose HomID, a collection of homogeneous spaces with anisotropic embedding, for benchmarking ID estimation methods. We observe that methods that perform well on standard benchmarks systematically degrade on HomID under identical resource allocation. We further observe that anisotropic distortion of such benchmarks also results in performance degradation. Finally, we analyze two particular IDE estimators and reveal that anisotropy induces such failure modes by causing shifts from the assumed distributions.

URL: https://openreview.net/forum?id=aBJAPDsNvm

---

Title: Learning Provably Correct Synchronous Crash Fault Tolerant Distributed Protocols With Minimal Human Knowledge

Abstract: Provably correct distributed protocols, which are a critical component of modern distributed systems, are highly challenging to design and have often required decades of human effort. These protocols allow multiple agents to coordinate to come to a common agreement in an environment with uncertainty and failures. As a starting point, this work focuses on synchronous Crash Fault Tolerance (CFT) protocols, a widely used family of distributed protocols, in a bounded setting with a small number of agents. We formulate protocol
design as a search problem over strategies in a game with imperfect information, and the desired correctness conditions are specified in Satisfiability Modulo Theories (SMT). However, standard methods for solving multi-agent games fail to learn correct protocols in this
setting, even when the number of agents is small. We propose a learning framework, GGMS, which integrates a specialized variant of Monte Carlo Tree Search with a transformer-based action encoder, a global depth-first search to break out of local minima, and repeated feed-back from a model checker. Protocols output by GGMS are verified correct via exhaustive model checking for all executions within the bounded setting. We further prove that, under certain assumptions, the search process is complete: if a correct protocol exists, GGMS will eventually find it. In experiments, we show that GGMS can learn correct protocols for larger settings than existing methods.

URL: https://openreview.net/forum?id=KBL0M8tTkM

---

Title: Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

Abstract: Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu et al. (2025), SAEs did not seem to live up to their original hype due to poor steering performance relative to a set of simple baselines. This work serves as a partial rebuttal for Sparse Autoencoders and suggests that the results of Wu et al. (2025) did not do them full justice. We find that Sparse Autoencoders can, in fact, perform close to on par with the reference LoRA performance on the AxBench benchmark, when features are selected and labelled with our supervised pipeline. We also find that our pipeline selects features that are surprisingly causal of their identified labels when using only its interpretability-based components. Lastly, we present evidence that high sparsity (low ℓ0) may not be crucial for successful steering based on interpretability, which is in contrast to the earlier findings in Wang et al. (2025)

URL: https://openreview.net/forum?id=7wZSpvtCbb

---

Title: STDP as Probabilistic Attribution: An Exact-Balance Continuous Kernel for Normalized Temporal Credit Assignment

Abstract: We introduce a unified continuous kernel for spike-timing-dependent plasticity (STDP) that connects local spike-timing updates to normalized probabilistic attribution in convergent circuits. Classical phenomenological STDP models describe long-term potentiation (LTP; the strengthening of synapses when a presynaptic spike precedes a postsynaptic spike) and long-term depression (LTD; the weakening of synapses in the reverse order) using piecewise timing windows, while standard simulator implementations commonly realize such rules with local traces. Our contribution is therefore not a claim of asymptotic speedup over trace-based STDP, but a single differentiable trace-interaction kernel whose induced learning window can be analyzed in closed form. The model represents presynaptic and postsynaptic events by dimensionless exponentially decaying traces and defines synaptic change by their cooperative and competitive interaction. For an isolated pre–post spike pair, we derive the closed-form STDP window and prove that the total integrated potentiation and depression areas are exactly balanced for all positive decay rates. We further summarize parameter sweeps and component ablations showing how the two decay rates tune window morphology and why both the multiplicative gating term and competitive difference term are required for a biphasic timing-sensitive window. A fit to the classical data of Bi and Poo (1998) gives $R^2=0.63$ and reveals a narrow near-coincident positive-update regime for small post-before-pre lags. This regime is a structural consequence of fitting a continuous kernel with mismatched decay rates; however, the raw observations in the corresponding interval are positive, so we treat it as a data-consistent near-synchronous attribution hypothesis rather than a confirmed biological mechanism or a mere fitting artifact. Forcing the zero crossing to $\Delta t = 0$ substantially worsens the fit (Appendix D). At the network level, we show that when the additive kernel is combined with multiplicative afferent normalization, the mean-field dynamics reduce to a delta-rule-like update whose fixed point is a normalized event-rate target, $w_i^* = \nu_i q_i/\sum_j \nu_j q_j$. Under a strict causal-window approximation, $q_i=P(\mathrm{Post}\mid \mathrm{Pre}_i)$, and under the corresponding partition assumptions this target can be interpreted as posterior attribution, $P(\mathrm{Pre}_i\mid \mathrm{Post})$. Without those assumptions, the fixed point should be read as normalized conditional event-rate attribution. Simulations confirm convergence in sparse regimes, document progressive degradation under dense firing, and show that the proposed kernel outperforms classical STDP baselines, including variants matched for area balance, trace mode, step size, and temporal footprint, under identical normalization. An iso-rate control confirms the network tracks attribution probability independent of firing rate, a decorrelated control confirms it tracks the product $\nu_i q_i$, and a heterogeneous-delay condition confirms robustness to non-uniform causal timing.

URL: https://openreview.net/forum?id=OneDKSGRKA

---

Title: CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

Abstract: Diffusion models excel at photorealistic synthesis but struggle with precise object counts, especially in high-density settings. We introduce COUNTLOOP, a training-free framework that achieves precise instance control through iterative, structured feedback. Our method alternates between synthesis and evaluation: a VLM-based planner generates structured scene layouts, while a VLM-based critic provides explicit feedback on object counts, spatial arrangements, and visual quality to refine the layout iteratively. Instance-driven attention masking and cumulative attention composition further prevent semantic leakage, ensuring clear object separation even in densely occluded scenes. Evaluations on COCO-Count, T2I-CompBench, and two newly introduced high instance benchmarks show that COUNTLOOP reduces counting error by up to 57% and achieves the highest or comparable spatial quality scores across all benchmarks, while maintaining photorealism.

URL: https://openreview.net/forum?id=2JxXGhpCP4

---

Title: STC-ViT: Spatio Temporal Continuous Vision Transformer for Medium-range Global Weather Forecasting

Abstract: Operational Numerical Weather Prediction (NWP) systems rely on computationally expensive physics-based models. Recently, transformer models have shown remarkable potential in weather forecasting achieving state-of-the-art results. However, traditional transformers discretize spatio-temporal dimensions, limiting their ability to model continuous dynamical weather processes. Moreover, their reliance on increased depth to capture complex dependencies results in higher computational cost and parameter redundancy. We address these issues with \textbf{STC-ViT}, a Spatio-Temporal Continuous Vision Transformer for weather forecasting. STC-ViT integrates a Fourier Neural Operator (FNO) for global spatial operators with a transformer-parameterized Neural ODE for continuous-time dynamics, yielding a space–time continuous model of weather forecasting. Our proposed method achieves competitive forecasting performance even with a shallow, single-layer transformer encoder, and scales further with depth as shown in our analysis (Section \ref{sec:scale}). STC-ViT generates complete forecast trajectories with an inference speed of only 0.125 seconds and achieves strong medium-range forecasting skill on $1.5^\circ$ WeatherBench 2 as compared to state-of-the-art data-driven and NWP models trained on higher-resolution data, with lower data and compute costs. We also provide detailed empirical analysis on model's performance with respect to denser time grids, higher-accuracy ODE solvers, and deeper transformer stacks. Our code is available at \url{https://anonymous.4open.science/r/STCViT-CC8B}.

URL: https://openreview.net/forum?id=yTeXmoeZ79

---

Title: ToolGuard: Red-Teaming Small Language Model Tool Call- ing on Consumer Hardware

Abstract: Small language models (SLMs) are increasingly deployed for tool calling on edge devices and
in agentic systems, yet their safety under adversarial conditions remains unstudied. Unlike
text generation, tool calling creates a unique attack surface: a single malicious tool call can
trigger irreversible real-world actions such as unauthorized financial transfers, data exfiltra-
tion, or privilege escalation. We present ToolGuard, to our knowledge the first systematic
study of adversarial robustness in SLM tool calling. We contribute: (1) a taxonomy of five
attack categories targeting tool-calling SLMs—parameter injection, tool substitution, privi-
lege escalation, data exfiltration, and chain attacks; (2) ToolAttackBench, a benchmark
of 50 adversarial prompts across 17 tool schemas in 5 domains; (3) an empirical red-team
evaluation of four SLM families (1B–3B parameters) over 10 independent runs per prompt,
revealing that capable SLMs exhibit attack success rates (ASR) of 47–52%, with a capabil-
ity floor for tool-call vulnerability near 1–1.7B parameters; and (4) a runtime defense that
enforces declarative security policies on completed tool calls, reducing mean ASR by 76%
(from 48.9% to 11.7%) on the full benchmark and 78% (from 49.3% to 10.9%) on a held-out
test set (Section 7.2), with 0% false positive rate on simulated canonical benign outputs
(n=41; per-model FPR on actual model outputs not measured; 95% Wilson CI: [0%, 8.6%])
and sub-5ms latency overhead. We find that tool substitution is the most dangerous attack
category (65.4% mean ASR) but is effectively neutralized by intent verification (reduced
to 5.0%), while data exfiltration proves the hardest to defend (44.3% → 31.7%) due to its
reliance on semantically valid tool calls. An adaptive evaluation with 8 policy-aware attacks
confirms that defense effectiveness drops against knowledgeable adversaries (19.0% defended
ASR vs. 10.9% on held-out attacks), underscoring the need for learned defense components.

URL: https://openreview.net/forum?id=0ct01Da0Ff

---

Title: A Survey of Post-Training in Time Series Foundation Models

Abstract: Time series foundation models (TSFMs) have emerged as general-purpose models for time series analysis, but pretraining alone is often insufficient for reliable downstream deployment. Bridging this gap requires further intervention to handle domain shift, task heterogeneity, limited supervision, and computational constraints, which motivates post-training as a broad class of methods to adapt, augment, compose, calibrate, or specialize pretrained TSFMs for downstream tasks. In this survey, we organize TSFM post-training methods by their locus of intervention in the prediction pipeline, yielding five categories: parameter adaptation, context augmentation, model composition, output processing and uncertainty control, and compression and specialization. For each category, we summarize representative methods, trace their development, and discuss their limitations. We further identify future directions toward controlled adaptation, reliable context construction, uncertainty-aware model composition, calibrated output processing, and deployment-aware specialization. Overall, by providing a comprehensive view of the emerging TSFM post-training landscape, this survey aims to support future research to navigate the design space between a pretrained TSFM and its reliable downstream deployment.

URL: https://openreview.net/forum?id=LzPNfS7gWh

---

Title: The Ensemble Inverse Problem: Applications and Methods

Abstract: We introduce a new multivariate statistical problem that we refer to as the Ensemble Inverse Problem (EIP). The aim of EIP is to invert for an ensemble that is distributed according to the pushforward of a prior under a forward process. In high energy physics (HEP), this is related to a widely known problem called unfolding, which aims to reconstruct the true physics distribution of quantities, such as momentum and angle, from measurements that are distorted by detector effects. In recent applications, the EIP also arises in full waveform inversion (FWI) and inverse imaging with unknown priors. We propose non-iterative inference-time methods that construct posterior samplers based on a new class of conditional generative models, which we call ensemble inverse generative models. For the posterior modeling, these models additionally use the ensemble information contained in the observation set on top of single measurements. Unlike existing methods, our proposed methods avoid explicit and iterative use of the forward model at inference time via training across several sets of truth-observation pairs that are consistent with the same forward model, but originate from a wide range of priors. We empirically demonstrate that this training procedure implicitly encodes the likelihood model. The use of ensemble information helps posterior inference and enables generalization to unseen priors. We benchmark the proposed method on several synthetic and real datasets in inverse imaging, HEP, and FWI.

URL: https://openreview.net/forum?id=XG3BcxSex0

---

Title: Training-Free Modality-Agnostic Concept Sliders: Fine-Grained Control via Diffusion Models of Images, Audio, and Video

Abstract: Diffusion models have become state-of-the-art generative models for images, audio, and video, yet enabling fine-grained controllable generation, i.e., continuously steering specific concepts without disturbing unrelated content, remains challenging. Concept Sliders (CS) offer a promising direction by discovering semantic directions through textual contrasts, but they require per-concept training and architecture-specific fine-tuning (e.g., LoRA), limiting scalability to new modalities. In this work, we introduce a simple yet effective approach that is fully training-free and modality-agnostic, achieved by partially estimating the CS formula during inference. To support modality-agnostic evaluation, we extend the CS benchmark to include both video and audio, establishing the first suite for fine-grained concept generation control with multiple modalities. We further propose three evaluation properties along with new metrics to improve evaluation quality. Finally, we identify an open problem of scale selection and non-linear traversals and introduce a two-stage procedure that automatically detects saturation points and reparameterizes traversal for perceptually uniform, semantically meaningful edits. Extensive experiments demonstrate that our method enables plug-and-play, training-free concept control across modalities, improves over existing baselines, and establishes new tools for principled controllable generation.

URL: https://openreview.net/forum?id=m2hpORnHhF

---

Title: Towards Well-Calibrated Active Learning

Abstract: We study well-calibrated Active Learning (AL), i.e., the problem of actively learning a classifier with a low calibration error. One of the most popular Acquisition Functions (AFs) in pool-based AL is querying by the model's uncertainty. However, we recognize that an uncalibrated uncertainty model on the unlabeled pool may significantly affect AF effectiveness, leading to high calibration error and sub-optimal generalization on unseen data. Deep Neural Networks (DNNs) make the problem even worse, as model uncertainty from DNNs is usually uncalibrated. Therefore, we propose a new AF, Calibration Priority Sampling, by estimating calibration errors and query samples with the highest calibration error before leveraging DNN uncertainty. Specifically, we utilize a kernel calibration error estimator under the covariate shift and formally show that AL with this AF eventually leads to a bounded calibration error on the unlabeled pool and unseen test data. Empirically, our proposed method surpasses other AF baselines by having a lower calibration and generalization error across pool-based AL settings.

URL: https://openreview.net/forum?id=PBKSZm9dQv

---

Title: Multi-axis Analysis of Image Manipulation Localization

Abstract: Advanced image editing software enables easy creation of highly convincing image manipulations, which has been made even more accessible in recent years due to advances in generative AI. Manipulated images, while often harmless, could spread misinformation, create false narratives, and influence people’s opinions on important issues. Despite this growing threat, there is limited research on detecting advanced manipulations across different visual domains. Thus, we introduce Analysis Under Domain-shifts, qualIty, Type, and Size (AUDITS), a comprehensive benchmark designed for studying axes of analysis in image manipulation detection. AUDITS comprises over 530K images from two distinct sources (user and news photos). We curate our dataset to support analysis across multiple axes using recent diffusion-based inpaintings, spanning a diverse range of manipulation types and sizes. We conduct experiments under different types of domain shift to evaluate robustness of existing image manipulation detection methods. Our goal is to drive further research in this area by offering new insights that would help develop more reliable and generalizable image manipulation detection methods

URL: https://openreview.net/forum?id=IYQSnqOkiH

---

Title: MV-RAG: Retrieval Augmented Multiview Diffusion

Abstract: Text-to-3D generation approaches have advanced significantly, producing high-quality and 3D-consistent outputs. However, they often fail to produce out-of-domain (OOD) or rare concepts, yielding inconsistent or inaccurate results. To this end, we propose MV-RAG, a novel text-to-3D pipeline that first retrieves relevant 2D images from a large in-the-wild 2D database and then conditions a multiview diffusion model on these images to synthesize consistent and accurate multiview outputs. Training such a retrieval-conditioned model is achieved via a novel hybrid strategy bridging structured multiview data and diverse 2D image collections. This involves training on multiview data using augmented conditioning views that simulate retrieval variance for view-specific reconstruction, alongside training on sets of retrieved real-world 2D images using a distinctive held-out view objective: the model predicts the held-out view from other views to infer 3D consistency from 2D data. We also introduce a prior-guided fusion mechanism that dynamically balances retrieval signals with the model's prior. To facilitate OOD evaluation, we introduce a new collection of challenging OOD prompts. Experiments against state-of-the-art text-to-3D, image-to-3D, and personalization baselines show that our approach significantly improves 3D consistency, photorealism, and text adherence for OOD/rare concepts, while maintaining competitive performance on standard benchmarks.

URL: https://openreview.net/forum?id=l6KwL5kK4l

---

Title: GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for Large Language Models

Abstract: As Large Language Models (LLMs) become increasingly integral to various domains, their potential to generate harmful responses has prompted significant societal and regulatory concerns. In response, governments have issued ethics guidelines to promote the development of trustworthy AI. However, these guidelines are typically high-level demands for developers and testers, leaving a gap in translating them into actionable testing questions to verify LLM compliance.

To address this challenge, we introduce GUARD (\textbf{G}uideline \textbf{U}pholding Test through \textbf{A}daptive \textbf{R}ole-play and Jailbreak \textbf{D}iagnostics), a testing method designed to operationalize guidelines into specific guideline-violating questions that assess LLM adherence. To implement this, GUARD uses automated generation of guideline-violating questions based on government-issued guidelines, thereby testing whether responses comply with these guidelines.
When responses directly violate guidelines, GUARD reports inconsistencies. Furthermore, for responses that do not directly violate guidelines, GUARD integrates the concept of ``jailbreaks'' to diagnostics, named GUARD-JD, which creates scenarios that provoke unethical or guideline-violating responses, effectively identifying potential scenarios that could bypass built-in safety mechanisms. Our method finally culminates in a compliance report, delineating the extent of adherence and highlighting any violations.

We empirically validated the effectiveness of GUARD on eight LLMs, including Vicuna-13B, LongChat-7B, Llama2-7B, Llama-3-8B, GPT-3.5, GPT-4, GPT-4o, and Claude-3.7, by testing compliance under three government-issued guidelines and conducting jailbreak diagnostics. Additionally, GUARD-JD can transfer jailbreak diagnostics to vision-language models (MiniGPT-v2 and Gemini-1.5), demonstrating its usage in promoting reliable LLM-based applications.

URL: https://openreview.net/forum?id=ubIcFJcQlQ

---

Title: Fairness-Aware Checkpoint Screening for Neural Models via Multi-Task Learning and Monte Carlo Dropout

Abstract: Machine learning models deployed in high-stakes domains often exhibit trade-offs between predictive performance and group fairness, and identifying models that navigate this trade-off remains challenging in practice. We present a neural in-processing framework that combines multi-task learning and Monte Carlo (MC) dropout to support uncertainty-aware checkpoint selection for fairness-aware prediction. Our approach jointly predicts a primary target and a protected attribute using a shared representation, then evaluates saved training checkpoints using predictive performance and a group-fairness objective based on disparate impact ratio. We use MC dropout to characterize checkpoint-level predictive variability and perform Pareto-based screening over fairness–performance trade-offs on a validation set, enabling selection of candidate checkpoints that better balance these competing objectives. We evaluate the approach on three datasets: ADULT, MIMIC-III, and SNAPSHOT, and compare against standard fairness baselines including reweighing, adversarial reweighted learning, and FairRF where applicable. Across these settings, the proposed selection strategy often identifies checkpoints with improved demographic-parity trade-offs relative to baseline models, while maintaining competitive predictive performance. We further provide qualitative saliency-map analyses to illustrate how feature emphasis may shift across selected checkpoints. Our results suggest that uncertainty-aware checkpoint screening can serve as a practical mechanism for navigating fairness–performance trade-offs in neural prediction pipelines. We discuss limitations, including dependence on neural architectures with MC dropout and the current focus on a demographic-parity-style fairness criterion.

URL: https://openreview.net/forum?id=QRMgNWfi5e

---

Reply all

Reply to author

Forward

0 new messages