Daily TMLR digest for Feb 10, 2026

1 view

Skip to first unread message

TMLR

unread,

Feb 10, 2026, 12:30:07 AM (2 days ago) Feb 10

to tmlr-anno...@googlegroups.com

New certifications
==================

J2C Certification: BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via Bilevel Optimization

Gustav Wagner Zakarias, Lars Kai Hansen, Zheng-Hua Tan

https://openreview.net/forum?id=GQAGlqOpyA

---

J2C Certification: The Internal Growth Function: A More General PAC Framework for Scenario Decision Making

Guillaume O Berger, Raphael Jungers

https://openreview.net/forum?id=HqPKJSAkrp

---

J2C Certification: Segmentation From Attention: Training-Free Layer Selection and One-Shot Tuning for Segmentation in VLMs

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little

https://openreview.net/forum?id=a5lAwubXro

---

Accepted papers
===============

Title: Weakly-Supervised Disentangled Representation Learning via Filter-Based Adaptive Swapping

Authors: Zhenyu Zong, Qidi Wang, Simon Yu, Hongpeng Cao, Yanbing Mao, Han Zhao, Lui Sha, Huajie Shao

Abstract: Disentangled representation learning (DRL) aims to uncover semantically meaningful latent factors from observed data, thereby improving both interpretability and generalization of machine learning (ML) models. Despite remarkable progress, unsupervised DRL cannot achieve complete disentanglement without inductive biases or supervision. To address this challenge, existing approaches either rely on full supervision, which demands extensive manual labeling, or weak supervision, which involves complex training strategies that often result in unstable training. To address these limitations, we propose Filter-VAE, a weakly supervised variational autoencoder (VAE) that introduces a filter-based adaptive swapping strategy to learn stable and meaningful disentangled representations. Specifically, a relevance filter removes semantically meaningless latent factors, while an adaptive swapping filter exchanges those latent factors that have reached stability. With these two filters, Filter-VAE adaptively swaps only stable and semantically aligned latent factors, leading to robust and meaningful representations. We evaluate Filter-VAE on three standard benchmarks and our created traffic sign dataset in two downstream tasks: disentanglement and adversarial robustness. Experimental results demonstrate that Filter-VAE achieves strong disentanglement performance with reduced supervision and delivers remarkable robustness against diverse adversarial attacks and corruptions. The code is released at https://github.com/ZY-Zong/Filter-VAE.git.

URL: https://openreview.net/forum?id=K69rKKozZU

---

Title: Graph Coarsening using Game Theoretic Approach

Authors: Sonali Raj, Manoj Kumar, Sumit Kumar, Ruchir Gupta, Amit Kumar Jaiswal

Abstract: Graph coarsening is a method for reducing the size of an original graph while preserving its structural and feature-related properties. In graph machine learning, it is often employed as a preprocessing step to improve efficiency and scalability when handling large graph
datasets. In this study, we address the challenge of coarsening an original graph into a coarsened graph that retains these characteristics. We propose a Cooperative-Based Graph Coarsening (CGC) algorithm, which leverages cooperative game theory as a framework
for combinatorial optimization, aiming to minimize the total Dirichlet energy of the graph through localized optimizations. We prove that the proposed coarsening game is a potential game that guarantees convergence to a stable coarsened graph. Tests on real-world datasets
demonstrate that CGC algorithm surpasses prior state-of-the-art techniques in terms of coarsened graph accuracy and achieves reduced time complexity. These results highlight the potential of game-theoretic approaches in the advancement of graph coarsening techniques.

URL: https://openreview.net/forum?id=5vLBjQJCln

---

Title: Pave Your Own Path: Graph Gradual Domain Adaptation on Fused Gromov-Wasserstein Geodesics

Authors: Zhichen Zeng, Ruizhong Qiu, Wenxuan Bao, Tianxin Wei, Xiao Lin, Yuchen Yan, Tarek F. Abdelzaher, Jiawei Han, Hanghang Tong

Abstract: Graph neural networks, despite their impressive performance, are highly vulnerable to distribution shifts on graphs.
Existing graph domain adaptation (graph DA) methods often implicitly assume a mild shift between source and target graphs, limiting their applicability to real-world scenarios with large shifts.
Gradual domain adaptation (GDA) has emerged as a promising approach for addressing large shifts by gradually adapting the source model to the target domain via a path of unlabeled intermediate domains.
Existing GDA methods exclusively focus on independent and identically distributed (IID) data with a predefined path, leaving their extension to non-IID graphs without a given path an open challenge.
To bridge this gap, we present Gadget, the first GDA framework for non-IID graph data.
First (theoretical foundation), the Fused Gromov-Wasserstein (FGW) distance is adopted as the domain discrepancy for non-IID graphs, based on which, we derive an error bound on node, edge and graph-level tasks, showing that the target domain error is proportional to the length of the path.
Second (optimal path), guided by the error bound, we identify the FGW geodesic as the optimal path, which can be efficiently generated by our proposed algorithm.
The generated path can be seamlessly integrated with existing graph DA methods to handle large shifts on graphs, improving state-of-the-art graph DA methods by up to 6.8% in accuracy on real-world datasets.

URL: https://openreview.net/forum?id=dTPBqTKGPs

---

Title: Clus-UCB: A Near-Optimal Algorithm for Clustered Bandits

Authors: Aakash Gore, Prasanna Chaporkar

Abstract: We study a stochastic multi-armed bandit setting where arms are partitioned into known clusters, such that the parameters of arms within a cluster differ by at most a known threshold. While the clustering structure is known a priori, the arm parameters are unknown. We derive an asymptotic lower bound on the regret that improves upon the classical bound of Lai & Robbins (1985). We then propose Clus-UCB, an efficient algorithm that closely matches this lower bound asymptotically by exploiting the clustering structure and introducing a new index to evaluate an arm, which depends on other arms within the cluster. In this way, arms share information among each other. We present simulation results of our algorithm and compare its performance against KL-UCB and other well-known algorithms for bandits with dependent arms. We discuss the robustness of the proposed algorithm under misspecified prior information, address some limitations of this work, and conclude by outlining possible directions for future research.

URL: https://openreview.net/forum?id=QDMvPO9WJT

---

Title: MetaSeal: Defending Against Image Attribution Forgery Through Content-Dependent Cryptographic Watermarks

Authors: Tong Zhou, Ruyi Ding, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Yunsi Fei, Xiaolin Xu, Shaolei Ren

Abstract: The rapid growth of digital and AI-generated images has amplified the need for secure and verifiable methods of image attribution. While digital watermarking offers more robust protection than metadata-based approaches—which can be easily stripped—current watermarking techniques remain vulnerable to forgery, creating risks of misattribution that can damage the reputations of AI model developers and the rights of digital artists. These vulnerabilities arise from two key issues: (1) content-agnostic watermarks, which, once learned or leaked, can be transferred across images to fake attribution, and (2) reliance on detector-based verification, which is unreliable since detectors can be tricked. We present MetaSeal, a novel framework for content-dependent watermarking with cryptographic security guarantees to safeguard image attribution. Our design provides (1) forgery resistance, preventing unauthorized replication and enforcing cryptographic verification; (2) robust, self-contained protection, embedding attribution directly into images while maintaining resilience against benign transformations; and (3) evidence of tampering, making malicious alterations visually detectable. Experiments demonstrate that MetaSeal effectively mitigates forgery attempts and applies to both natural and AI-generated images, establishing a new standard for secure image attribution.

URL: https://openreview.net/forum?id=8i3ErmCfdJ

---

Title: Diversity Sampling Regularization for Multi-Domain Generalization

Authors: Lakpa Tamang, Mohamed Reda Bouadjenek, Sunil Aryal, Richard Dazeley

Abstract: Domain Generalization (DG) seeks to create models that can successfully generalize to new,
unseen target domains without the need for target domain data during training. Traditional
approaches often rely on data augmentation or feature mixing techniques, such as MixUp;
however, these methods may fall short in capturing the essential diversity within the feature
space, resulting in limited robustness against domain shifts. In this research, we revisit the
importance of diversity in DG tasks and propose a simple yet effective method to improve DG
performance through diversity-sampling regularization. Specifically, we calculate entropy
values for input data to assess their prediction uncertainty, and use these values to guide
sampling through Determinantal Point Process (DPP), which prioritizes selecting data sub-
sets with high diversity. By incorporating DPP-based diversity sampling as a regularization
strategy, our framework enhances the standard Empirical Risk Minimization (ERM) objec-
tive, promoting the learning of domain-agnostic features without relying on explicit data aug-
mentation. We empirically validate the effectiveness of our method on standard DG bench-
marks, including PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, and through
extensive experiments show that it consistently improves generalization to unseen domains
and outperforms widely used baselines and S.O.T.A without relying on any task-specific
heuristics.

URL: https://openreview.net/forum?id=nXqMt7X2RX

---

Title: The Speed-up Factor: A Quantitative Multi-Iteration Active Learning Performance Metric

Authors: Hannes Kath, Thiago S. Gouvêa, Daniel Sonntag

Abstract: Machine learning models excel with abundant annotated data, but annotation is often costly and time-intensive.
Active learning (AL) aims to improve the performance-to-annotation ratio by using query methods (QMs) to iteratively select the most informative samples.
While AL research focuses mainly on QM development, the evaluation of this iterative process lacks appropriate performance metrics.
This work reviews eight years of AL evaluation literature and formally introduces the speed-up factor, a quantitative multi-iteration QM performance metric that indicates the fraction of samples needed to match random sampling performance.
Using four datasets from diverse domains and seven QMs of various types, we empirically evaluate the speed-up factor and compare it with state-of-the-art AL performance metrics.
The results confirm the assumptions underlying the speed-up factor, demonstrate its accuracy in capturing the described fraction, and reveal its superior stability across iterations.

URL: https://openreview.net/forum?id=q6hRb6fETo

---

Title: BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via Bilevel Optimization

Authors: Gustav Wagner Zakarias, Lars Kai Hansen, Zheng-Hua Tan

Abstract: Models initialized from self-supervised pretraining may suffer from poor alignment with downstream tasks, limiting the extent to which subsequent fine-tuning can adapt relevant representations acquired during the pretraining phase. To mitigate this, we introduce BiSSL, a novel bilevel training framework that enhances the alignment of self-supervised pretrained models with downstream tasks by explicitly incorporating both the pretext and downstream tasks into a preparatory training stage prior to fine-tuning. BiSSL solves a bilevel optimization problem in which the lower-level adheres to the self-supervised pretext task, while the upper-level encourages the lower-level backbone to align with the downstream objective. The bilevel structure facilitates enhanced information sharing between the tasks, ultimately yielding a backbone model that is more aligned with the downstream task, providing a better initialization for subsequent fine-tuning. We propose a general training algorithm for BiSSL that is compatible with a broad range of pretext and downstream tasks. We demonstrate that our proposed framework significantly improves accuracy on the vast majority of a broad selection of image-domain downstream tasks, and that these gains are consistently retained across a wide range of experimental settings. In addition, exploratory alignment analyses further underpin that BiSSL enhances downstream alignment of pretrained representations.

URL: https://openreview.net/forum?id=GQAGlqOpyA

---

Title: The Internal Growth Function: A More General PAC Framework for Scenario Decision Making

Authors: Guillaume O Berger, Raphael Jungers

Abstract: This paper introduces a new PAC framework for scenario decision-making problems.
Scenario decision making consists in making a decision that satisfies a probabilistic constraint (also called a chance constraint) from finitely many sampled realizations (called scenarios) of the constraint.
PAC bounds are sufficient conditions on the number of samples to guarantee with high confidence that the sample-based decision satisfies the true constraint with a prescribed probability.
Existing PAC bounds rely on intrinsic properties of the problem, such as convexity (Calafiore and Campi, 2005), finite VC dimension (Alamo et al., 2009) or existence of a compression scheme (Margellos et al., 2014).
While powerful in some applications, these PAC bounds can be vacuous (or infinite) when the properties are not satisfied.
In this paper, we propose a new PAC framework, leading to PAC bounds that are not vacuous for a strictly larger class of scenario decision-making problems.
This bound is based on the novel notion of ``internal growth'', which adapts the notion of ``growth function'' from classical machine learning (Vapnik and Chervonenkis, 1968) to scenario decision making.
We also relate this notion to other novel properties of the system, such as the $k$-VC dimension.
Furthermore, we show a partial converse result: namely, that for the family of stable monotone scenario decision algorithms, the algorithm is PAC if \emph{and only if} it satisfies our criterion.
Finally, we demonstrate the usefulness of our framework, and compare with existing approaches, on practical problems.

URL: https://openreview.net/forum?id=HqPKJSAkrp

---

Title: Bootstrapping Task Spaces for Self-Improvement

Authors: Minqi Jiang, Andrei Lupu, Yoram Bachrach

Abstract: Progress in many task domains emerges from repeated revisions to previous solution attempts. Training agents that can reliably self-improve over such sequences at inference-time is a natural target for reinforcement learning (RL), yet the naive approach assumes a fixed maximum iteration depth, which can be both costly and arbitrary. We present Exploratory Iteration (ExIt), a family of autocurriculum RL methods that directly exploits the recurrent structure of self-improvement tasks to train LLMs to perform multi-step self-improvement at inference-time while only training on the most informative single-step iterations. ExIt grows a task space by selectively sampling the most informative intermediate, partial histories encountered during an episode for continued iteration, treating these starting points as new self-iteration task instances to train a self-improvement policy. ExIt can further pair with explicit exploration mechanisms to sustain greater task diversity. Across several domains, encompassing competition math, multi-turn tool-use, and machine learning engineering, we demonstrate that ExIt strategies, starting from either a single or many task instances, can produce policies exhibiting strong inference-time self-improvement on held-out task instances, and the ability to iterate towards higher performance over a step budget extending beyond the average iteration depth encountered during training.

URL: https://openreview.net/forum?id=k2VsgUxC6X

---

Title: Segmentation From Attention: Training-Free Layer Selection and One-Shot Tuning for Segmentation in VLMs

Authors: Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little

Abstract: Large-scale vision-language models (VLMs), trained on extensive datasets of image-text pairs, exhibit strong multimodal understanding capabilities by implicitly learning associations between textual descriptions and image regions. This emergent ability enables zero-shot object detection and segmentation, using techniques that rely on text-image attention maps, without necessarily training on abundant labeled segmentation datasets. However, performance of such methods depends heavily on prompt engineering and manually selected layers or head choices for the attention layers. In this work, we propose a training-free entropy-based measure, InfoScore, to identify the best image-text attention layers for segmentation, providing a more flexible and scalable solution for training-free open-vocabulary segmentation, reducing the additional burden of hyperparamter search. We empirically show that our training-free selection strategy is superior to naive selection strategies. Additionally, we demonstrate that instead of solely relying on text prompts, fine-tuning the image-text attention layer with a single visual example of each class significantly improves segmentation without the need of additional parameters or decoders. Moreover, we show that our methods and findings are general and can be applied across various vision-language models (VLMs).

URL: https://openreview.net/forum?id=a5lAwubXro

---

Title: Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

Authors: Isaac Xu, Martin Gillis, Ayushi Sharma, Benjamin Misiuk, Craig J. Brown, Thomas Trappenberg

Abstract: In hierarchical multi-label classification, a persistent challenge is enabling model predictions to reach deeper levels of the hierarchy for more detailed or fine-grained classifications. This difficulty partly arises from the natural rarity of certain classes (or hierarchical nodes) and the hierarchical constraint that ensures child nodes are almost always less frequent than their parents. To address this, we propose a weighted loss objective for neural networks that combines node-wise imbalance weighting with focal weighting components, the latter leveraging modern quantification of ensemble uncertainties. By emphasizing rare nodes rather than rare observations (data points), and focusing on uncertain nodes for each model output distribution during training, we observe improvements in recall by up to a factor of five on benchmark datasets, along with statistically significant gains in $F_{1}$ score. We also show our approach aids convolutional networks on challenging tasks, as in situations with suboptimal encoders or limited data.

URL: https://openreview.net/forum?id=hf4zEWWIvE

---

Title: An Efficient Subset Selection Strategy Using Text-Guided Data Attribution to Mitigate Simplicity Bias

Authors: Kumar Shubham, Pranav Sastry, Prathosh AP

Abstract: The effectiveness of deep learning models heavily relies on the quality and diversity of their training data. However, datasets collected from different sources often introduce simplicity biases, where a models rely on easily learnable but non-predictive (spurious) features for its predictions. While existing debiasing techniques focus on model robustness, they leave the data untouched. However, as data becomes increasingly valuable, identifying and mitigating bias directly at the data level has become increasingly important. Recently, data attribution has emerged as a promising tool for uncovering issues in training data, yet its vulnerability to simplicity bias has received limited attention. In this work, we propose a novel data deletion framework that combines Neural Tangent Kernel (NTK)-based data attribution with textual descriptions of bias to identify and remove training samples that do not significantly affect model performance. We first demonstrate that NTK-based data attribution methods can themselves be influenced by spurious features. Subsequently, to mitigate this, we use available metadata or, when unavailable, a vision-language model to annotate a small validation set and extract a textual description of the bias. Based on this description and the attribution score, we identify the subset of training data that are semantically aligned with the spurious feature and affect the generalization of the model. Removing these samples from the training dataset and training model on the new subset improves the average and worst-group accuracy of the model, outperforming existing attribution-based baselines.

URL: https://openreview.net/forum?id=zZ5YundT95

---

New submissions
===============

Title: Scaling Large Language Models with Fully Sparse Activations

Abstract: Activation sparsity can reduce the inference cost of large language models (LLMs) by lowering both compute and memory traffic. Yet most existing approaches sparsify only FFN intermediate states, leaving substantial portions of inference effectively dense. We study how to scale fully sparsely activated LLMs, in which every activation participating in linear transformations is sparse. We focus on two questions: how to train such models effectively, and how activation sparsity affects model quality as scale increases. We develop a pre-training recipe that enables effective training fully sparsely activated LLMs from scratch, including using squared ReLU as activation function, top-K sparsification and a straight-through estimator for the remaining linear layers. Extensive experiments spanning model sizes, training-token budgets, and target sparsity levels reveal that its performance gap to dense baselines narrows with model scale, increases nonlinearly with sparsity, while remaining largely insensitive to the training-token budget. Finally, we investigate post-training activation sparsification of pre-trained dense models via both training-free techniques and supervised fine-tuning, and observe a similar trend as pre-training experiments: larger models are more robust to sparsification, and exhibit increasingly sparse activation patterns. Overall, our results provide practical training recipes and empirical guidance for building and scaling LLMs with fully sparse activations.

URL: https://openreview.net/forum?id=MntjMCroiE

---

Title: Test-Time Adaptation of Vision-Language Models with Low-Rank Pseudo-Consistency

Abstract: While test-time adaptation (TTA) methods enable vision-language models (VLMs) to adapt under distribution shifts, they typically rely on simple feature transformations following frozen encoders while learning from potentially noisy pseudo-labels. This approach may limit adaptation under significant domain shifts. In this paper, we propose PseudoAdapter, a novel TTA framework for VLMs that introduces low-rank adapters into early layers of the encoder to enable domain-specific feature adaptation while maintaining generalization. To ensure effective learning from noisy and low-confidence predictions, PseudoAdapter combines confidence-calibrated pseudo-labelling with unsupervised consistency learning across augmented views. We further extend our approach with PseudoAdapter+, which integrates selective teacher supervision to improve adaptation with minimal overhead. Extensive evaluations on four out-of-distribution and ten cross-domain benchmarks demonstrate that our method outperforms prior state-of-the-art TTA approaches by an average of 6.84\% and 3.25\%, respectively. Ablation studies confirm the effectiveness of each proposed component.

URL: https://openreview.net/forum?id=GDw4pvX9aG

---

Title: POPS: Recovering Unlearned Multi-Modality Knowledge in MLLMs with Prompt-Optimized Parameter Shaking

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance on cross-modal tasks by jointly training on large-scale textual and visual data, where privacy-sensitive examples could be unintentionally encoded, raising concerns about privacy or copyright violation. To this end, Multi-modality Machine Unlearning (MMU) was proposed as a mitigation that can effectively force MLLMs to forget private information. However, the robustness of such unlearning methods is not fully exploited when the model is published and accessible to malicious users. In this paper, we propose a novel adversarial strategy, namely Prompt-Optimized Parameter Shaking (POPS), aiming to recover the supposedly unlearned multi-modality knowledge from the MLLMs. Our method elicits the victim MLLMs to generate potential private examples via prompt-suffix optimization, and then exploits these synthesized outputs to fine-tune the models so they disclose the true private information. The experiments on the different MMU benchmarks reveal substantial weaknesses in the existing MMU algorithms. Our POPS can even achieve a near-complete recovery of supposedly erased sensitive information on the unlearned MLLMs, exposing fundamental vulnerabilities that challenge the foundational robustness of representative MMU-based privacy protections.

URL: https://openreview.net/forum?id=wMiEcH84l9

---

Title: Learning Where It Matters: Responsible and Interpretable Text-to-Image Generation with Background Consistency

Abstract: Text-to-image diffusion models have achieved remarkable progress, yet they still struggle to produce unbiased and responsible outputs. A promising direction is to manipulate the bottleneck space of the U-Net (the $h$-space), which provides \textit{interpretability} and \textit{controllability}. However, existing methods rely on learning attributes from the entire image, entangling them with spurious features and offering no corrective mechanisms at inference. This uniform reliance leads to poor subject alignment, fairness issues, reduced photorealism, and incoherent backgrounds in scene-specific prompts. To address these challenges, we propose two complementary innovations for training and inference. First, we introduce a spatially focused concept learning framework that disentangles target attributes into concept vectors by suppressing target attribute features within the multi-head cross-attention (MCA) modules and attenuating the encoder output (i.e., $h$-vector) to ensure the concept vector exclusively captures target attribute features. In addition, we introduce a spatially weighted reconstruction loss to emphasize regions relevant to the target attribute. Second, we design an inference-time strategy that improves background consistency by enhancing low-frequency components in the $h$-space. Experiments demonstrate that our approach improves fairness, subject fidelity, and background coherence while preserving visual quality and prompt alignment, outperforming state-of-the-art $h$-space methods. The code is included in the supplementary material.

URL: https://openreview.net/forum?id=sCOJGbJwAJ

---

Title: Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs

Abstract: Large language models (LLMs) have become pivotal in artificial intelligence, demonstrating strong capabilities in reasoning, understanding, and generating data. However, their deployment on edge devices is hindered by their substantial size, often reaching several billion parameters. Quantization is a widely used method to reduce memory usage and inference time, however LLMs present unique challenges due to the prevalence of outliers in their activations. In this work, we leverage the theoretical advantages of Hadamard matrices over random rotation matrices to push the boundaries of quantization in LLMs. We demonstrate that Hadamard matrices are more effective in reducing outliers, which are a significant obstacle in achieving low-bit quantization. Our method based on a gradual binary search enables 3-bit quantization for weights, activations, and key-value (KV) caches, resulting in a 40% increase in accuracy on common benchmarks compared to SoTA methods. We extend the use of rotation matrices to support non-power-of-2 embedding dimensions, similar to the Qwen architecture, by employing the Paley's algorithm. Our experimental results on multiple models family like Mistral, LLaMA, and Qwen demonstrate the effectiveness of our approach, outperforming existing methods and enabling practical 3-bit quantization.

URL: https://openreview.net/forum?id=fno6W7qwhT

---

Title: TOAST: Transformer Optimization using Adaptive and Simple Transformations

Abstract: Foundation models achieve State-of-the-art (SOTA) performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining or fine-tuning, limiting their practicality. Recent findings suggest that deep neural networks exhibit internal representation similarities. While such similarities across different models have been exploited for enabling techniques such as model stitching and merging, intra-network redundancy remains underexplored as a source for efficiency gains. In this paper, we introduce TOAST, a framework that exploits these redundancies to approximate entire transformer blocks with lightweight closed-form mappings, such as linear transformation or even the identity, without any additional training. Across SOTA pretrained vision models (e.g., ViT, DINOv2, DeiT) and datasets ranging from MNIST to ImageNet-1k, TOAST reduces parameters and computation while preserving, and in some cases improving, downstream performance. These results show that large portions of transformer depth can be replaced by trivial functions, opening a new perspective on efficient foundation models.

URL: https://openreview.net/forum?id=fSwMCsBtTG

---

Title: Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles

Abstract: This paper explores the performance of a random Gaussian smoothing zeroth-order (ZO) scheme for minimising quasar-convex (QC) and strongly quasar-convex (SQC) functions in both unconstrained and constrained settings. For the unconstrained problem, we establish the ZO algorithm's convergence to a global minimum along with its complexity when applied to both QC and SQC functions. For the constrained problem, we introduce the new notion of proximal-quasar-convexity and prove analogous results to the unconstrained case. Specifically, we derive complexity bounds and prove convergence of the algorithm to a neighbourhood of a global minimum whose size can be controlled under a variance reduction scheme. Beyond the theoretical guarantees, we demonstrate the practical implications of our results on several machine learning problems where quasar-convexity naturally arises, including linear dynamical system identification and generalised linear models.

URL: https://openreview.net/forum?id=rRp9zZBKkZ

---

Title: Scientific Theory of a Black-Box: A Life Cycle-Scale XAI Framework Based on Constructive Empiricism

Abstract: Explainable AI (XAI) offers a growing number of algorithms that aim to answer specific questions about black-box models. What is missing is a principled way to consolidate explanatory information about a fixed black-box model into a persistent, auditable artefact, that accompanies the black-box throughout its life cycle. We address this gap by introducing the notion of a scientific theory of a black box (SToBB). Grounded in Constructive Empiricism, a SToBB fulfils three obligations: (i) empirical adequacy with respect to all available observations of black-box behaviour, (ii) adaptability via explicit update commitments that restore adequacy when new observations arrive, and (iii) auditability through transparent documentation of assumptions, construction choices, and update behaviour. We operationalise these obligations as a general framework that specifies an extensible observation base, a traceable hypothesis class, algorithmic components for construction and revision, and documentation sufficient for third-party assessment. Explanations for concrete stakeholder needs are then obtained by querying the maintained record through interfaces, rather than by producing isolated method outputs. As a proof of concept, we instantiate a complete SToBB for a neural-network classifier on a tabular task and introduce the Constructive Box Theoriser (CoBoT) algorithm, an online procedure that constructs and maintains an empirically adequate rule-based surrogate as observations accumulate. Together, these contributions position SToBBs as a life cycle-scale, inspectable point of reference that supports consistent, reusable analyses and systematic external scrutiny.

URL: https://openreview.net/forum?id=kAjPN8pSTK

---

Title: Learning-Augmented Robust Algorithmic Recourse

Abstract: Algorithmic recourse provides individuals who receive undesirable outcomes from machine learning systems with minimum-cost improvements to achieve a desirable outcome. However, machine learning models often get updated, so the recourse may not lead to the desired outcome. The robust recourse framework chooses recourses that are less sensitive to adversarial model changes, but this comes at a higher cost. To address this, we initiate the study of learning-augmented algorithmic recourse and evaluate the extent to which a designer equipped with a prediction of the future model can reduce the cost of recourse when the prediction is accurate (consistency) while also limiting the cost even when the prediction is inaccurate (robustness). We propose a novel algorithm, study the robustness-consistency trade-off, and analyze how prediction accuracy affects performance.

URL: https://openreview.net/forum?id=IFssttzxnP

---

Title: Existing Adversarial Large Language Model Unlearning Evaluations Are Inconclusive

Abstract: Unlearning seeks to remove sensitive knowledge from large language models, with success often judged through adversarial evaluations. In this work, we critically examine these evaluation practices and reveal key limitations that undermine their reliability. First, we show that adversarial evaluations introduce new information into the model, potentially masking true unlearning performance by re-teaching the model during evaluation. Second, we show that evaluation outcomes vary significantly across tasks, undermining the generalizability of current evaluation methods. Collectively, these issues suggest that existing evaluations risk mischaracterizing unlearning success (or failure). To address this, based on our empirical findings, we propose two principles—*minimal information injection* and *downstream task awareness*—for future evaluations.

URL: https://openreview.net/forum?id=Zxx1I4aJlm

---

Title: ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation

Abstract: Neural rendering has advanced significantly in 3D reconstruction and novel view synthesis, and integrating physics into these frameworks opens new applications such as physically accurate digital twins for robotics and XR.
However, the inverse problem of estimating physical parameters from visual observations remains challenging.
Existing physics-aware neural rendering methods typically require dense multi-view videos, making them impractical for scalable, real-world deployment.
Under sparse-view settings, the sequential optimization strategies employed by current approaches suffer from severe error accumulation: inaccuracies in initial 3D reconstruction propagate to subsequent stages, degrading physical state and material parameter estimates.
On the other hand, simultaneous optimization of all parameters fails due to the highly non-convex and often non-differentiable nature of the problem.
We propose ProJo4D, a progressive joint optimization framework that gradually expands the set of jointly optimized parameters. This design enables physics-informed gradients to refine geometry while avoiding the instability of direct joint optimization over all parameters.
Evaluations on synthetic and real-world datasets demonstrate that ProJo4D substantially outperforms prior work in 4D future state prediction and physical parameter estimation, achieving up to 10$\times$ improvement in geometric accuracy while maintaining computational efficiency.

URL: https://openreview.net/forum?id=pqvVrqlXCZ

---

Title: Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG

Abstract: Retrieval-Augmented Generation (RAG) improves factuality but retrieving for every query often hurts quality while inflating tokens and latency. We propose Training-free Adaptive Retrieval Gating (\textbf{TARG}), a single-shot policy that decides when to retrieve using only a short, no-context draft from the base model. From the draft’s prefix logits, TARG computes lightweight uncertainty scores—mean token entropy, a margin signal derived from the top-1/top-2 logit gap via a monotone link, or small-$N$ variance across a handful of stochastic prefixes—and triggers retrieval only when the score exceeds a threshold. The gate is model-agnostic, adds only tens to hundreds of draft tokens, and requires no additional training or auxiliary heads. On NQ-Open, TriviaQA, and PopQA, TARG consistently pushes the accuracy–efficiency frontier: compared with Always-RAG\footnote{\textsc{Always-RAG}: retrieve for every query; \textsc{Never-RAG}: never retrieve.}, TARG matches or improves EM/F1 while reducing retrieval by 70–90\% and cutting end-to-end latency, and it remains close to Never-RAG in overhead. A central empirical finding is that under modern instruction-tuned LLMs the margin signal is a robust default (entropy compresses as backbones sharpen), with small-$N$ variance offering a conservative, budget-first alternative. We provide ablations over gate type and prefix length and use a $\Delta$-latency view to make budget trade-offs explicit.

URL: https://openreview.net/forum?id=L8gYtUZfVU

---

Title: The Weakest Link: A Nodal Tension Model for Local Network Resilience

Abstract: The resilience of networked systems, defined by their ability to withstand targeted disruptions between a source and a target, is a critical concern in fields from ecology to infrastructure management. While spectral methods offer global insights, characterising the specific vulnerability of targeted pathways requires a more direct approach. In this paper, we frame this problem of local resilience through the powerful lens of linear duality, adopting the classic dual of the maximum $s-t$ flow problem and interpreting it through a novel physical analogy of "Nodal Tension". Our main theoretical results establish that (1) the model's optimal value is exactly equal to the capacity of the minimum $s-t$ cut, and (2) an optimal vertex solution exists where all node potentials are integer-valued ($\{0,1\}$), thus revealing the precise combinatorial structure of the cut. We validate these theorems computationally against standard algorithms. We then apply our model to a real-world conservation problem: assessing the connectivity of a grizzly bear corridor in the Canadian Rocky Mountains. The analysis reveals a novel ecological insight: the corridor's weakest link is not a remote bottleneck, but the local perimeter of the source protected area itself. This "null signal" for a classic choke point challenges conventional conservation paradigms and demonstrates our model's utility in generating non-obvious, actionable scientific discoveries. Our work provides a complete polyhedral characterisation of local network resilience, offering a computationally efficient and scientifically interpretable tool. Code to reproduce all results is available at https://anonymous.4open.science/r/tmlr-ldnr

URL: https://openreview.net/forum?id=mNAyZmizQi

---

Title: LLMs Can Leverage Graph Structural Information in Text-Attributed Graphs

Abstract: A recurring claim in recent LLM-as-predictor work on text-attributed graphs (TAGs) is that in-context learning (ICL) benefits mainly from the textual attributes of neighboring nodes (often via homophily), while general-purpose LLMs cannot reliably exploit graph structure—especially edge direction and local topology. This paper re-evaluates that claim by asking a focused question: Can general-purpose LLMs genuinely leverage graph structural information in TAGs via ICL, once we remove confounding factors and provide an architecture explicitly designed for structural reasoning? We first introduce controlled neighborhood rewiring tests that keep node texts and label distributions fixed while perturbing structure. Across seven LLMs and four low-homophily WebKB graphs, both first-order flipping and two-hop extreme rewiring consistently degrade accuracy -2.06~-23.15% average relatively drop), demonstrating genuine structural sensitivity. After flipping, structural sensitivity strongly increases with model capability, and the performance advantage of stronger models arises primarily from correct structure rather than better text-only processing. We further show that apparent ``structure misuse'' in weaker models can be corrected by adding explicit step-by-step instructions. The previous claims is due to confounding factors—the traditional ICL framework lacks a dedicated mechanism for graph structure reasoning and handling lengthy multi-hop neighborhood contexts, rather than the inherent nature of LLMs themselves. Motivated by these findings, we propose the Text Attributes Passing Thoughts Network (TAPTN), an edge-aware, MPNN-like ICL framework that iteratively summarizes multi-hop neighborhoods using a structure-aware template and self-generated instructions. TAPTN substantially outperforms zero-shot CoT and GraphICL-style baselines on five TAG datasets by at least +13.98%, especially on malignant heterophilic graphs (with +15~25% gain), and when used to produce structurally enriched texts for downstream fine-tuning, achieves performance competitive with state-of-the-art GNN pipelines. Collectively, the results establish that LLMs can exploit structure information in TAGs as effective as SOTA GNNs through ICL once with an appropriate architecture mitigating the confounding factors.

URL: https://openreview.net/forum?id=WhaVqEkkMY

---

Title: TextTeacher: What Can Language Teach About Images?

Abstract: The platonic representation hypothesis suggests that sufficiently large models converge to a shared representation geometry, even across modalities.
Motivated by this, we ask:
Can the semantic knowledge of a language model efficiently improve a vision model?
As an answer, we introduce TextTeacher, a simple auxiliary objective that injects text embeddings as additional information into image classification training.
TextTeacher uses readily available image captions, a pre-trained and frozen text encoder, and a lightweight projection to produce semantic anchors that guide efficiently representations during training while leaving the inference-time model unchanged.
On ImageNet with standard ViT backbones, TextTeacher improves accuracy by up to $+2.7$ percentage points (p.p.) and yields consistent transfer gains (on average $+1.0$ p.p.) under the same recipe and compute.
It outperforms vision knowledge distillation, yielding more accuracy at a constant compute budget or similar accuracy, but $33\%$ faster.
Our analysis indicates that TextTeacher acts as a feature‑space preconditioner, shaping deeper layers in the first stages of training, and aiding generalization by supplying complementary semantic cues.
TextTeacher adds negligible overhead, requires no costly multimodal pretraining and preserves the simplicity and latency of pure vision models.
We release our code at \texttt{<URL upon acceptance>}.

URL: https://openreview.net/forum?id=Xwb0aEUwKh

---

Title: Learning Materials Interatomic Potentials via Hybrid Invariant-Equivariant Architectures

Abstract: Machine learning interatomic potentials (MLIPs) can predict energy, force, and stress of materials and enable a wide range of downstream discovery tasks. A key design choice in MLIPs involves the trade-off between invariant and equivariant architectures. Invariant models offer computational efficiency but may not perform as well, especially when predicting high-order outputs. In contrast, equivariant models can capture high-order symmetries, but are computationally expensive. In this work, we propose HIENet, a \underline{h}ybrid \underline{i}nvariant-\underline{e}quivariant materials interatomic potential model that integrates both invariant and equivariant message passing layers. Furthermore, we show that HIENet provably satisfies key physical constraints. HIENet achieves superior performance with considerable computational speedups over prior models. Experimental results on both common benchmarks and downstream materials discovery tasks demonstrate the efficiency and effectiveness of HIENet. Finally, additional ablations further demonstrate that our hybrid invariant-equivariant approach scales well across model sizes and works with different equivariant model architectures, providing powerful insights into future MLIP designs.

URL: https://openreview.net/forum?id=fq3nrVqNmL

---

Title: Randomized PCA Forest for Unsupervised Outlier Detection

Abstract: We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Motivated by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for unsupervised outlier detection by deriving an outlier score from its intrinsic properties. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects its robustness and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection.

URL: https://openreview.net/forum?id=hHJWe6Qcfe

---

Title: Analysis of Natural Actor-Critic with Randomized Low- Discrepancy Sampling

Abstract: Natural gradient methods are appealing in policy optimization due to their invariance to smooth reparameterization and their ability to account for the local geometry of the policy manifold. These properties often lead to improved conditioning of the optimization problem compared to Euclidean policy gradients. However, their reliance on Monte Carlo estimation introduces high variance and sensitivity to hyperparameters. In this paper, we address these limitations by integrating Randomized Quasi-Monte Carlo (RQMC) sampling into the natural actor-critic (NAC) framework. We revisit the NAC linear system and show that, under imperfect value approximation, the NAC update decomposes exactly into the true natural gradient plus a Fisher-metric projection of the Bellman residual onto the score-feature span. We further develop RQMC-based NAC estimators that replace IID sampling with randomized low-discrepancy trajectories. We provide a variance analysis showing that these RQMC-based estimators strictly reduce estimator variance under mild regularity conditions, thereby reducing the propagation of Bellman-residual error into the natural-gradient update. Empirical results on certain reinforcement learning benchmarks demonstrate that our RQMC-enhanced algorithms consistently match or improve upon the performance and stability of their vanilla counterparts

URL: https://openreview.net/forum?id=kOSx9v6dfb

---

Title: Explicit Second-Order Min-Max Optimization: Practical Algorithms and Complexity Analysis

Abstract: We propose and analyze several inexact regularized Newton-type methods for finding a global saddle point of \emph{convex-concave} unconstrained min-max optimization problems. Compared to first-order methods, our understanding of second-order methods for min-max optimization is relatively limited, as obtaining global rates of convergence with second-order information can be much more involved. In this paper, we examine how second-order information is used to speed up extra-gradient methods, even under inexactness. In particular, we show that the proposed methods generate iterates that remain within a bounded set and that the averaged iterates converge to an $\epsilon$-saddle point within $O(\epsilon^{-2/3})$ iterations in terms of a restricted gap function. We also provide a simple routine for solving the subproblem at each iteration, requiring a single Schur decomposition and $O(\log\log(1/\epsilon))$ calls to a linear system solver in a quasi-upper-triangular system. Thus, our method improves the existing line-search-based second-order min-max optimization methods by shaving off an $O(\log\log(1/\epsilon))$ factor in the required number of Schur decompositions. Finally, we conduct experiments on synthetic and real data to demonstrate the efficiency of the proposed methods.

URL: https://openreview.net/forum?id=Hyk1GhEXGa

---

Title: Adversarially Robust Latent Bandits in Multiplayer Asymmetric Settings

Abstract: We examine a novel multiplayer extension of the latent multi-armed bandit problem as formulated in \cite{maillard2014latent}, with broad applications such as recommendation systems and cognitive radio. Following \cite{chang2022online}, we examine three information asymmetric scenarios: Problem A, in which players receive identical rewards but cannot observe each other's actions; Problem B, players receive private i.i.d rewards but can observe others' actions; and Problem C, players receive private i.i.d rewards and cannot observe others' actions. For problems A and B, we provide nearly optimal gap-independent regret bounds. When reduced to the single agent setting, our results improve on \cite{maillard2014latent} by allowing for adversarial nature's actions. For Problem C, we use the knowledge of the reward means to improve on the results in \cite{chang2022online}.

URL: https://openreview.net/forum?id=v5tLAfd2Ke

---

Reply all

Reply to author

Forward

0 new messages