Weekly TMLR digest for Jun 01, 2025

22 views

Skip to first unread message

TMLR

unread,

Jun 1, 2025, 12:00:12 AMJun 1

to tmlr-annou...@googlegroups.com

New certifications
==================

Survey Certification: Personalization of Large Language Models: A Survey

Zhehao Zhang, Ryan A. Rossi, Branislav Kveton, Yijia Shao, Diyi Yang, Hamed Zamani, Franck Dernoncourt, Joe Barrow, Tong Yu, Sungchul Kim, Ruiyi Zhang, Jiuxiang Gu, Tyler Derr, Hongjie Chen, Junda Wu, Xiang Chen, Zichao Wang, Subrata Mitra, Nedim Lipka, Nesreen K. Ahmed, Yu Wang

https://openreview.net/forum?id=tf6A9EYMo6

---

Reproducibility Certification: An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets

Po-Yi Lu, Yi-Jie Cheng, Chun-Liang Li, Hsuan-Tien Lin

https://openreview.net/forum?id=855yo1Ubt2

---

Expert Certification: (Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models

Andreas Kirsch

https://openreview.net/forum?id=ON7dtdEHVQ

---

Survey Certification: Efficient Diffusion Models: A Survey

Hui Shen, Jingxuan Zhang, Boning Xiong, Rui Hu, Shoufa Chen, Zhongwei Wan, Xin Wang, Yu Zhang, Zixuan Gong, Guangyin Bao, Chaofan Tao, Yongfeng Huang, Ye Yuan, Mi Zhang

https://openreview.net/forum?id=wHECkBOwyt

---

Accepted papers
===============

Title: Personalization of Large Language Models: A Survey

Authors: Zhehao Zhang, Ryan A. Rossi, Branislav Kveton, Yijia Shao, Diyi Yang, Hamed Zamani, Franck Dernoncourt, Joe Barrow, Tong Yu, Sungchul Kim, Ruiyi Zhang, Jiuxiang Gu, Tyler Derr, Hongjie Chen, Junda Wu, Xiang Chen, Zichao Wang, Subrata Mitra, Nedim Lipka, Nesreen K. Ahmed, Yu Wang

Abstract: Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. In this work, we bridge the gap between these two separate main directions for the first time by introducing a taxonomy for personalized LLM usage and summarizing the key differences and challenges. We provide a formalization of the foundations of personalized LLMs that consolidates and expands notions of personalization of LLMs, defining and discussing novel facets of personalization, usage, and desiderata of personalized LLMs. We then unify the literature across these diverse fields and usage scenarios by proposing systematic taxonomies for the granularity of personalization, personalization techniques, datasets, evaluation methods, and applications of personalized LLMs. Finally, we highlight challenges and important open problems that remain to be addressed. By unifying and surveying recent research using the proposed taxonomies, we aim to provide a clear guide to the existing literature and different facets of personalization in LLMs, empowering both researchers and practitioners.

URL: https://openreview.net/forum?id=tf6A9EYMo6

---

Title: An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets

Authors: Po-Yi Lu, Yi-Jie Cheng, Chun-Liang Li, Hsuan-Tien Lin

Abstract: Active Learning (AL) addresses the crucial challenge of enabling machines to efficiently gather labeled examples through strategic queries. Among the many AL strategies, Uncertainty Sampling (US) stands out as one of the most widely adopted. US queries the example(s) that the current model finds uncertain, proving to be both straightforward and effective. Despite claims in the literature suggesting superior alternatives to US, community-wide acceptance remains elusive. In fact, existing benchmarks for tabular datasets present conflicting conclusions on the continued competitiveness of US. In this study, we review the literature on AL strategies in the last decade and build the most comprehensive open-source AL benchmark to date to understand the relative merits of different AL strategies. The benchmark surpasses existing ones by encompassing a broader coverage of strategies, models, and data. Through our investigation of the conflicting conclusions in existing tabular AL benchmarks by evaluation under broad AL experimental settings, we uncover fresh insights into the often-overlooked issue of using machine learning models--**model compatibility** in the context of US. Specifically, we notice that adopting the different models for the querying unlabeled examples and learning tasks would degrade US's effectiveness. Notably, our findings affirm that US maintains a competitive edge over other strategies when paired with compatible models. These findings have practical implications and provide a concrete recipe for AL practitioners, empowering them to make informed decisions when working with tabular classifications with limited labeled data. The code for this project is available on https://github.com/ariapoy/active-learning-benchmark.

URL: https://openreview.net/forum?id=855yo1Ubt2

---

Title: Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study

Authors: Liangliang Zhang, Haoran Bao, Yao Ma

Abstract: As graph data grows increasingly complicated, training graph neural networks (GNNs) on large-scale datasets presents significant challenges, including computational resource constraints, data redundancy, and transmission inefficiencies.
While existing graph condensation techniques have shown promise in addressing these issues, they are predominantly designed for single-label datasets, where each node is associated with a single class label. However, many real-world applications, such as social network analysis and bioinformatics, involve multi-label graph datasets, where one node can have various related labels. To deal with this problem, we extend traditional graph condensation approaches to accommodate multi-label datasets by introducing modifications to synthetic dataset initialization and condensing optimization. Through experiments on eight real-world multi-label graph datasets, we prove the effectiveness of our method. In the experiment, the GCond framework, combined with K-Center initialization and binary cross-entropy loss (BCELoss), generally achieves the best performance. This benchmark for multi-label graph condensation not only enhances the scalability and efficiency of GNNs for multi-label graph data but also offers substantial benefits for diverse real-world applications.

URL: https://openreview.net/forum?id=7aJxaPg30d

---

Title: Diffusion Model Predictive Control

Authors: Guangyao Zhou, Sivaramakrishnan Swaminathan, Rajkumar Vasudeva Raju, J Swaroop Guntupalli, Wolfgang Lehrach, Joseph Ortiz, Antoine Dedieu, Miguel Lazaro-Gredilla, Kevin Patrick Murphy

Abstract: We propose Diffusion Model Predictive Control (D-MPC), a novel MPC approach that learns a multi-step action proposal and a multi-step dynamics model, both using diffusion models, and combines them for use in online MPC. On the popular D4RL benchmark, we show performance that is significantly better than existing model-based offline planning methods using MPC (e.g. MBOP) and competitive with state-of-the-art (SOTA) model-based and model-free reinforcement learning methods. We additionally illustrate D-MPC’s ability to optimize novel reward functions at run time and adapt to novel dynamics, and highlight its advantages compared to existing diffusion-based planning baselines.

URL: https://openreview.net/forum?id=pvtgffHtJm

---

Title: Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting

Authors: Siddhant Bhambri, Mudit Verma, Subbarao Kambhampati

Abstract: The reasoning abilities of Large Language Models (LLMs) remain a topic of considerable interest and debate. Among the original papers arguing for emergent reasoning abilities of LLMs, ReAct became particularly popular by claiming to tease out LLM reasoning abilities with special prompting involving “interleaving reasoning trace with action execution". In this paper, we critically examine the claims of ReAct style prompting for planning and sequential decision-making problems. By introducing systematic variations to the input prompt, we perform a sensitivity analysis along the original claims of ReAct. Our experiments in AlfWorld and WebShop, domains that were used in the original ReAct work, show that the performance is minimally influenced by the interleaved reasoning trace or by the content of these generated reasoning traces. Instead, the performance of LLMs is primarily driven by the unreasonably high degree of similarity between input example tasks and queries, with shockingly little ability to generalize. In addition to raising questions on claims about reasoning abilities, this lack of generalization also implicitly forces the prompt designer to provide instance-specific examples, significantly increasing the cognitive burden on the human. Our empirical results show that the perceived reasoning abilities of LLMs stem from the exemplar-query similarity and approximate retrieval rather than any inherent reasoning abilities, thereby leading to severe lack of generalization beyond the few-shot examples given in the prompts. Our code and prompt settings can be found here on GitHub.

URL: https://openreview.net/forum?id=aFAMPSmNHR

---

Title: Multi-Attribute Constraint Satisfaction via Language Model Rewriting

Authors: Ashutosh Baheti, Debanjana Chakraborty, Faeze Brahman, Ronan Le Bras, Ximing Lu, Nouha Dziri, Yejin Choi, Mark Riedl, Maarten Sap

Abstract: Obeying precise constraints on top of multiple external attributes is a common computational problem underlying seemingly different domains, from controlled text generation to protein engineering. Existing language model (LM) controllability methods for multi-attribute constraint satisfaction often rely on specialized architectures or gradient-based classifiers, limiting their flexibility to work with arbitrary black-box evaluators and pretrained models. Current general-purpose large language models, while capable, cannot achieve fine-grained multi-attribute control over external attributes. Thus, we create Multi-Attribute Constraint Satisfaction (MACS), a generalized method capable of finetuning language models on any sequential domain to satisfy user-specified constraints on multiple external real-value attributes. Our method trains LMs as editors by sampling diverse multi-attribute edit pairs from an initial set of paraphrased outputs. During inference, LM iteratively improves upon its previous solution to satisfy constraints for all attributes by leveraging our designed constraint satisfaction reward. We additionally experiment with reward-weighted behavior cloning to further improve the constraint satisfaction rate of LMs. To evaluate our approach, we present a new Fine-grained Constraint Satisfaction (FineCS) benchmark, featuring two challenging tasks: (1) Text Style Transfer, where the goal is to simultaneously modify the sentiment and complexity of reviews, and (2) Protein Design, focusing on modulating fluorescence and stability of Green Fluorescent Proteins (GFP). Our empirical results show that MACS achieves the highest threshold satisfaction in both FineCS tasks, outperforming strong domain-specific baselines. Our work opens new avenues for generalized and real-value multi-attribute control, with implications for diverse applications spanning natural language processing and bioinformatics.

URL: https://openreview.net/forum?id=3q1bUIHTJK

---

Title: Learning distributed representations with efficient SoftMax normalization

Authors: Lorenzo Dall'Amico, Enrico Maria Belliardo

Abstract: Learning distributed representations, or embeddings, that encode the relational similarity patterns among objects is a relevant task in machine learning. A popular method to learn the embedding matrices $X, Y$ is optimizing a loss function of the term ${\rm SoftMax}(XY^T)$. The complexity required to calculate this term, however, runs quadratically with the problem size, making it a computationally heavy solution. In this article, we propose a linear-time heuristic approximation to compute the normalization constants of ${\rm SoftMax}(XY^T)$ for embedding vectors with bounded norms. We show on some pre-trained embedding datasets that the proposed estimation method achieves higher or comparable accuracy with competing methods. From this result, we design an efficient and task-agnostic algorithm that learns the embeddings by optimizing the cross entropy between the softmax and a set of probability distributions given as inputs. The proposed algorithm is interpretable and easily adapted to arbitrary embedding problems. We consider a few use cases and observe similar or higher performances and a lower computational time than similar ``2Vec'' algorithms.

URL: https://openreview.net/forum?id=9M4NKMZOPu

---

Title: AttentionSmithy: A Modular Framework for Rapid Transformer Development

Authors: Caleb Cranney, Jesse G Meyer

Abstract: Transformer architectures have revolutionized a broad spectrum of AI applications by leveraging attention mechanisms for parallelized and long-range sequence processing. Despite their remarkable success, building and customizing transformers remains prohibitively complex for many domain experts who lack deep knowledge of low-level implementations. We introduce AttentionSmithy, a modular software package that lowers the barrier to transformer innovation by decomposing key components---attention modules, feed-forward networks, normalization layers, and positional encodings---into reusable building blocks. By disentangling architectural elements into well-defined interfaces, users can rapidly prototype, adapt, and evaluate transformer variants without extensive coding overhead. Our framework currently supports four distinct positional encoding strategies (sinusoidal, learned, rotary, and ALiBi), offers modular integration of multiple attention methods (including standard attention, Longformer, and Linformer), and integrates seamlessly with neural architecture search (NAS) for automated design exploration. The system is designed to support future extensions with minimal overhead. We validate AttentionSmithy by replicating the original ``Attention Is All You Need'' transformer under resource constraints, demonstrating robust performance on a machine translation task. Leveraging the package’s integrated NAS capability, we identified an optimized model configuration that outperformed our baseline, demonstrating the framework’s effectiveness for automated architecture search and model improvement. We further illustrate AttentionSmithy's adaptability through gene-specific modeling, where a variant of a BERT-style architecture achieves over 95\% accuracy on downstream cell type classification tasks using ranked transcriptomic data. These case studies underscore AttentionSmithy's core advantage: enabling specialized experimentation across diverse application domains---from natural language processing to genomic analysis---by obviating the need for labor-intensive, low-level framework manipulation. We anticipate that AttentionSmithy will serve as a foundation for creative transformer-based solutions, expediting research and development in numerous scientific and industrial fields.

URL: https://openreview.net/forum?id=0jhoriH9yA

---

Title: Evaluating explainability techniques on discrete-time graph neural networks

Authors: Manuel Dileo, Matteo Zignani, Sabrina Tiziana Gaito

Abstract: Discrete-time temporal Graph Neural Networks (GNNs) are powerful tools for modeling evolving graph-structured data and are widely used in decision-making processes across domains such as social network analysis, financial systems, and collaboration networks. Explaining the predictions of these models is an important research area due to the critical role their decisions play in building trust in social or financial systems. However, the explainability of Temporal Graph Neural Networks remains a challenging and relatively unexplored field. Hence, in this work, we propose a novel framework to evaluate explainability techniques tailored for discrete-time temporal GNNs. Our framework introduces new training and evaluation settings that capture the evolving nature of temporal data, defines metrics to assess the temporal aspects of explanations, and establishes baselines and models specific to discrete-time temporal networks. Through extensive experiments, we outline the best explainability techniques for discrete-time GNNs in terms of fidelity, efficiency, and human-readability trade-offs. By addressing the unique challenges of temporal graph data, our framework sets the stage for future advancements in explaining discrete-time GNNs.

URL: https://openreview.net/forum?id=JzmXo0rfry

---

Title: Alternators For Sequence Modeling

Authors: Mohammad Reza Rezaei, Adji Bousso Dieng

Abstract: This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively. The parameters of the OTN and the FTN are not time-dependent and are learned via a minimum cross-entropy criterion over the trajectories. Alternators are versatile. They can be used as dynamical latent-variable generative models or as sequence-to-sequence predictors. Alternators can uncover the latent dynamics underlying complex sequential data, accurately forecast and impute missing data, and sample new trajectories. We showcase the capabilities of alternators in three applications. We first used alternators to model the Lorenz equations, often used to describe chaotic behavior. We then applied alternators to Neuroscience to map brain activity to physical activity. Finally, we applied alternators to Climate Science, focusing on sea-surface temperature forecasting. In all our experiments, we found alternators are stable to train, fast to sample from, yield high-quality generated samples and latent variables, and often outperform strong baselines such as Mambas, neural ODEs, and diffusion models in the domains we studied.

URL: https://openreview.net/forum?id=Q70C1HQ0VO

---

Title: Evaluating Long Range Dependency Handling in Code Generation LLMs

Authors: Yannick Assogba, Donghao Ren

Abstract: As language models support larger and larger context sizes, evaluating their ability to make effective use of that context becomes increasingly important. We analyze the ability of several code generation models to handle long range dependencies using a suite of multi-step key retrieval tasks in context windows up to 8k tokens in length. The tasks progressively increase in difficulty and allow more nuanced evaluation of model capabilities than tests like the popular needle-in-the-haystack test. We find that performance degrades significantly for many models (up to 2x) when a function references another function that is defined later in the prompt. We also observe that models that use sliding window attention mechanisms have difficulty handling references further than the size of a single window. We perform simple prompt modifications using call graph information to improve multi-step retrieval performance up to 3x. Our analysis highlights ways that long-context performance needs deeper consideration beyond retrieval of single facts within a document.

URL: https://openreview.net/forum?id=yzACI2vFaX

---

Title: CLImage: Human-Annotated Datasets for Complementary-Label Learning

Authors: Hsiu-Hsuan Wang, Mai Tan Ha, Nai-Xuan Ye, Wei-I Lin, Hsuan-Tien Lin

Abstract: Complementary-label learning (CLL) is a weakly-supervised learning paradigm that aims to train a multi-class classifier using only complementary labels, which indicate classes to which an instance does not belong. Despite numerous algorithmic proposals for CLL, their practical applicability remains unverified for two reasons. Firstly, these algorithms often rely on assumptions about the generation of complementary labels, and it is not clear how far the assumptions are from reality. Secondly, their evaluation has been limited to synthetically labeled datasets. To gain insights into the real-world performance of CLL algorithms, we developed a protocol to collect complementary labels from human annotators. Our efforts resulted in the creation of four datasets: CLCIFAR10, CLCIFAR20, CLMicroImageNet10, and CLMicroImageNet20, derived from well-known classification datasets CIFAR10, CIFAR100, and TinyImageNet200. These datasets represent the very first real-world CLL datasets, namely CLImage, which are publicly available at: https://github.com/ntucllab/CLImage_Dataset. Through extensive benchmark experiments, we discovered a notable decrease in performance when transitioning from synthetically labeled datasets to real-world datasets. We investigated the key factors contributing to the decrease with a thorough dataset-level ablation study. Our analyses highlight annotation noise as the most influential factor in the real-world datasets. In addition, we discover that the biased-nature of human-annotated complementary labels and the difficulty to validate with only complementary labels are two outstanding barriers to practical CLL. These findings suggest that the community focus more research efforts on developing CLL algorithms and validation schemes that are robust to noisy and biased complementary-label distributions.

URL: https://openreview.net/forum?id=FHkWY4aGsN

---

Title: ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis

Authors: Bernard Spiegl, Andrea Perin, Stephane Deny, Alexander Ilin

Abstract: Deep learning is providing a wealth of new approaches to the problem of novel view synthesis, from Neural Radiance Field (NeRF) based approaches to end-to-end style architectures. Each approach offers specific strengths but also comes with limitations in their applicability. This work introduces ViewFusion, an end-to-end generative approach to novel view synthesis with unparalleled flexibility. ViewFusion consists in simultaneously applying a diffusion denoising step to any number of input views of a scene, then combining the noise gradients obtained for each view with an (inferred) pixel-weighting mask, ensuring that for each region of the target view only the most informative input views are taken into account. Our approach resolves several limitations of previous approaches by (1) being trainable and generalizing across multiple scenes and object classes, (2) adaptively taking in a variable number of pose-free views at both train and test time, (3) generating plausible views even in severely underdetermined conditions (thanks to its generative nature)---all while generating views of quality on par or even better than comparable methods. Limitations include not generating a 3D embedding of the scene, resulting in a relatively slow inference speed, and our method only being tested on the relatively small Neural 3D Mesh Renderer dataset. Code is available.

URL: https://openreview.net/forum?id=amUisgrmte

---

Title: Learning Actionable Counterfactual Explanations in Large State Spaces

Authors: Keziah Naggita, Matthew Walter, Avrim Blum

Abstract: Recourse generators provide actionable insights, often through feature-based counterfactual explanations (CFEs), to help negatively classified individuals understand how to adjust their input features to achieve a positive classification. These feature-based CFEs, which we refer to as \emph{low-level} CFEs, are overly specific (e.g., coding experience: $4 \to 5+$ years) and often recommended in a feature space that doesn't straightforwardly align with real-world actions. To bridge this gap, we introduce three novel recourse types grounded in real-world actions: high-level continuous (\emph{hl-continuous}), high-level discrete (\emph{hl-discrete}), and high-level ID (\emph{hl-id}) CFEs.

We formulate single-agent CFE generation methods for hl-discrete and hl-continuous CFEs. For the hl-discrete CFE, we cast the task as a weighted set cover problem that selects the least cost set of hl-discrete actions that satisfy the eligibility of features, and model the hl-continuous CFE as a solution to an integer linear program that identifies the least cost set of hl-continuous actions capable of favorably altering the prediction of a linear classifier. Since these methods require costly optimization per agent, we propose data-driven CFE generation approaches that, given instances of agents and their optimal CFEs, learn a CFE generator that quickly provides optimal CFEs for new agents. This approach, also viewed as one of learning an optimal policy in a family of large but deterministic MDPs, considers several problem formulations, including formulations in which the actions and their effects are unknown, and therefore addresses informational and computational challenges.

We conduct extensive empirical evaluations using publicly available healthcare datasets (BRFSS, Foods, and NHANES) and fully-synthetic data. For negatively classified agents identified by linear and threshold-based binary classifiers, we compare the proposed forms of recourse to low-level CFEs, which suggest how the agent can transition from state $\mathbf{x}$ to a new state $\mathbf{x}'$ where the model prediction is desirable. We also extensively evaluate the effectiveness of our neural network-based, data-driven CFE generation approaches. Empirical results show that the proposed data-driven CFE generators are accurate and resource-efficient, and the proposed forms of recourse offer various advantages over the low-level CFEs.

URL: https://openreview.net/forum?id=tXnVRpRlR8

---

Title: Explaining Node Embeddings

Authors: Zohair Shafi, Ayan Chatterjee, Tina Eliassi-Rad

Abstract: Node embedding algorithms produce low-dimensional latent representations of nodes in a graph. These embeddings are often used for downstream tasks, such as node classification and link prediction. In this paper, we investigate the following two questions: (Q1) Can we explain each embedding dimension with human-understandable graph features (e.g. degree, clustering coefficient and PageRank). (Q2) How can we modify existing node embedding algorithms to produce embeddings that can be easily explained by human-understandable graph features? We find that the answer to Q1 is yes and introduce a new framework called XM (short for eXplain eMbedding) to answer Q2. A key aspect of XM involves minimizing the nuclear norm of the generated explanations. We show that by minimizing the nuclear norm, we minimize the lower bound on the entropy of the generated explanations. We test XM on a variety of real-world graphs and show that XM not only preserves the performance of existing node embedding methods, but also enhances their explainability.

URL: https://openreview.net/forum?id=QQZ8uPxFb3

---

Title: Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

Authors: Zehao Wang, Han Zhou, Matthew B. Blaschko, Tinne Tuytelaars, Minye Wu

Abstract: Novel view synthesis of indoor scenes can be achieved by capturing a monocular video sequence of the environment. However, redundant information caused by artificial movements in the input video data reduces the efficiency of scene modeling. To address this, we formulate the problem as a combinatorial optimization task for view subset selection. In this work, we propose a novel subset selection framework that integrates a comprehensive diversity-based measurement with well-designed utility functions. We provide a theoretical analysis of these utility functions and validate their effectiveness through extensive experiments. Furthermore, we introduce IndoorTraj, a novel dataset designed for indoor novel view synthesis, featuring complex and extended trajectories that simulate intricate human behaviors. Experiments on IndoorTraj show that our framework consistently outperforms baseline strategies while using only 5–20% of the data, highlighting its remarkable efficiency and effectiveness.

URL: https://openreview.net/forum?id=F42CRfcp3D

---

Title: Flexible Infinite-Width Graph Convolutional Neural Networks

Authors: Ben Anson, Edward Milsom, Laurence Aitchison

Abstract: A common theoretical approach to understanding neural networks is to take an infinite-width limit, at which point the outputs become Gaussian process (GP) distributed. This is known as a neural network Gaussian process (NNGP). However, the NNGP kernel is fixed and tunable only through a small number of hyperparameters, thus eliminating the possibility of representation learning. This contrasts with finite-width NNs, which are often believed to perform well because they are able to flexibly learn representations for the task at hand. Thus in simplifying NNs to make them theoretically tractable, NNGPs may eliminate precisely what makes them work well (representation learning). This motivated us to understand whether representation learning is necessary in a range of node classification tasks on graphs. We develop a precise tool for this task, the graph convolutional deep kernel machine. This is very similar to an NNGP, in that it is an infinite width limit and uses kernels, but comes with a “knob” to control the amount of flexibility and hence representation learning. We found that representation learning gives noticeable performance improvements for heterophilous node classification tasks, but less so for homophilous node classification tasks.

URL: https://openreview.net/forum?id=Q2M4yijKSo

---

Title: GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks

Authors: Taraneh Younesian, Daniel Daza, Emile van Krieken, Thiviyan Thanapalasingam, Peter Bloem

Abstract: Graph neural networks (GNNs) learn to represent nodes by aggregating information from
their neighbors. As GNNs increase in depth, their receptive field grows exponentially, leading
to high memory costs. Several works in the literature proposed to address this shortcoming by
sampling subgraphs, or by using historical embeddings. These methods have mostly focused
on benchmarks of single-label node classification on homophilous graphs, where neighboring
nodes often share the same label. However, most of these methods rely on static heuristics
that may not generalize across different graphs or tasks. We argue that the sampling method
should be adaptive, adjusting to the complex structural properties of each graph. To this
end, we introduce GRAPES, an adaptive sampling method that learns to identify the set of
nodes crucial for training a GNN. GRAPES trains a second GNN to predict node sampling
probabilities by optimizing the downstream task objective. We evaluate GRAPES on various
node classification benchmarks involving homophilous as well as heterophilous graphs. We
demonstrate GRAPES’ effectiveness in accuracy and scalability, particularly in multi-label
heterophilous graphs. Additionally, GRAPES uses orders of magnitude less GPU memory
than a strong baseline based on historical embeddings. Unlike other sampling methods,
GRAPES maintains high accuracy even with smaller sample sizes and, therefore, can scale
to massive graphs. Our implementation is publicly available online.

URL: https://openreview.net/forum?id=QI0l842vSq

---

Title: Local Differential Privacy-Preserving Spectral Clustering for General Graphs

Authors: Sayan Mukherjee, Vorapong Suppakitpaisarn

Abstract: Spectral clustering is a widely used algorithm to find clusters in networks. Several researchers have studied the stability of spectral clustering under local differential privacy with the additional assumption that the underlying networks are generated from the stochastic block model (SBM). However, we argue that this assumption is too restrictive since social networks do not originate from the SBM. Thus, we delve into an analysis for general graphs in this work. Our primary focus is the edge flipping method -- a common technique for protecting local differential privacy. We show that, when the edges of an $n$-vertex graph satisfying some reasonable well-clustering assumptions are flipped with a probability of $O(\log n/n)$, the clustering outcomes are largely consistent. Empirical tests further corroborate these theoretical findings. Conversely, although clustering outcomes have been stable for non-sparse and well-clustered graphs produced from the SBM, we show that in general, spectral clustering may yield highly erratic results on certain graphs when the flipping probability is $\omega(\log n/n)$. This indicates that the best privacy budget obtainable for general graphs is $\Theta(\log n)$.

URL: https://openreview.net/forum?id=zo5b60AuAH

---

Title: (Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models

Authors: Andreas Kirsch

Abstract: Epistemic uncertainty is crucial for safety-critical applications and data acquisition tasks. Yet, we find an important phenomenon in deep learning models: an epistemic uncertainty collapse as model complexity increases, challenging the assumption that larger models invariably offer better uncertainty quantification. We introduce implicit ensembling as a possible explanation for this phenomenon. To investigate this hypothesis, we provide theoretical analysis and experiments that demonstrate uncertainty collapse in explicit ensembles of ensembles and show experimental evidence of similar collapse in wider models across various architectures, from simple MLPs to state-of-the-art vision models including ResNets and Vision Transformers. We further develop implicit ensemble extraction techniques to decompose larger models into diverse sub-models, showing we can thus recover epistemic uncertainty. We explore the implications of these findings for uncertainty estimation.

URL: https://openreview.net/forum?id=ON7dtdEHVQ

---

Title: Visually Descriptive Language Model for Vector Graphics Reasoning

Authors: Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji

Abstract: Despite significant advancements, current large multimodal models (LMMs) struggle to bridge the gap between low-level visual perception—focusing on shapes, sizes, and layouts—and high-level language reasoning involving semantics, events, and logic. This limitation becomes evident in tasks requiring precise visual perception, such as comparing geometric properties or solving visual algorithmic reasoning problems. To study this failure mode, we focus on an important visual domain: vector graphics —images composed purely of 2D objects and shapes, which are prevalent in Web, PC, and Mobile environments. Importantly, we consider rasterized vector graphics without assuming access to their underlying vector code. We identify two key research questions: how can we enable precise visual perception, and how can we facilitate high-level reasoning based on such low-level perceptions? To accurately capture low-level visual details, we explore using SVG for the precise encoding of visual scenes. However, SVGs are not readily interpretable by LLMs or LMMs in a zero-shot manner. To address this challenge, we propose the Visually Descriptive Language Model (VDLM) to build a bridge between low-level visual perception and high-level language reasoning. VDLM learns an intermediate symbolic representation called Primal Visual Description (PVD), which translates raw SVGs into a higher-level abstraction comprising primitive attributes. This abstraction allows for direct interpretation by foundation models for zero-shot generalization to different reasoning tasks. Without any human-annotated data, VDLM leads to significant improvements in state-of-the-art LMMs, such as GPT-4o, across various low-level multimodal perception and reasoning tasks on rasterized vector graphics. Additionally, we provide extensive analyses of VDLM’s performance, showing that our framework offers improved interpretability due to its disentangled perception and reasoning processes. As the first attempt to construct a descriptive intermediate representation for low-level visual reasoning, we also conduct an in-depth error analysis, highlighting remaining limitations and suggesting directions for future research.

URL: https://openreview.net/forum?id=WzS33L1iPC

---

Title: Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport

Authors: Eduardo Fernandes Montesuma, EL HABAZI Adel, Fred Maurice NGOLE MBOULA

Abstract: Detecting anomalies in datasets is a longstanding problem in machine learning. In this context, anomalies are defined as a sample that significantly deviates from the remaining data. Meanwhile, Optimal Transport (OT) is a field of mathematics concerned with the transportation, between two probability distribution, at least effort. In classical OT, the optimal transportation strategy of a distribution to itself is the identity, i.e., each sample keeps its mass. In this paper, we tackle anomaly detection by forcing samples to displace its mass, while keeping the least effort objective. We call this new transportation problem Mass Repulsing Optimal Transport (MROT). Naturally, samples lying in low density regions of space will be forced to displace mass very far, incurring a higher transportation cost. In contrast, samples on high density regions are able to send their mass just outside an \emph{exclusion zone}. We use these concepts to design a new anomaly score. Through a series of experiments in existing benchmarks, and fault detection problems, we show that our algorithm improves over existing methods. Our code is publicly available at https://github.com/eddardd/MROT

URL: https://openreview.net/forum?id=PPGJ3EvENv

---

Title: Algorithm Configuration for Structured Pfaffian Settings

Authors: Maria Florina Balcan, Anh Tuan Nguyen, Dravyansh Sharma

Abstract: Data-driven algorithm design uses historical problem instances to automatically adjust and optimize algorithms to their application domain, typically by selecting algorithms from parameterized families. While the approach has been highly successful in practice, providing theoretical guarantees for several algorithmic families remains challenging. This is due to the intricate dependence of the algorithmic performance on the parameters, often exhibiting a piecewise discontinuous structure. In this work, we present new frameworks for providing learning guarantees for parameterized data-driven algorithm design problems in both statistical and online learning settings.
For the statistical learning setting, we introduce the Pfaffian GJ framework, an extension of the classical Goldberg-Jerrum (GJ) framework (Bartlett et al., 2022; Goldberg & Jerrum, 1993), that is capable of providing learning guarantees for function classes for which the computation involves Pfaffian functions. Unlike the GJ framework, which is limited to function classes with computation characterized by rational functions (quotients of two polynomials), our proposed framework can deal with function classes involving Pfaffian functions, which
are much more general and widely applicable. We then show that for many parameterized algorithms of interest, their utility function possesses a refined piecewise structure, which automatically translates to learning guarantees using our proposed framework.
For the online learning setting, we provide a new tool for verifying the dispersion property of a sequence of loss functions, a sufficient condition that allows no-regret learning for sequences of piecewise structured loss functions where the piecewise structure involves Pfaffian transition boundaries. We use our framework to provide novel learning guarantees for many challenging data-driven design problems of interest, including data-driven linkage-based clustering, graph-based semi-supervised learning, and regularized logistic regression.

URL: https://openreview.net/forum?id=Xmk1or5eH8

---

Title: Generalizable Representation Learning for fMRI-based Neurological Disorder Identification

Authors: Wenhui Cui, Haleh Akrami, Anand Joshi, Richard Leahy

Abstract: Despite the impressive advances achieved using deep learning for functional brain activity analysis, the heterogeneity of functional patterns and the scarcity of imaging data still pose challenges in tasks such as identifying neurological disorders. For functional Magnetic Resonance Imaging (fMRI), while data may be abundantly available from healthy controls, clinical data is often scarce, especially for rare diseases, limiting the ability of models to identify clinically-relevant features. We overcome this limitation by introducing a novel representation learning strategy integrating meta-learning with self-supervised learning to improve the generalization from normal to clinical features. This approach enables generalization to challenging clinical tasks featuring scarce training data. We achieve this by leveraging self-supervised learning on the control dataset to focus on inherent features that are not limited to a particular supervised task and incorporating meta-learning to improve the generalization across domains. To explore the generalizability of the learned representations to unseen clinical applications, we apply the model to four distinct clinical datasets featuring scarce and heterogeneous data for neurological disorder classification. Results demonstrate the superiority of our representation learning strategy on diverse clinically-relevant tasks.

URL: https://openreview.net/forum?id=zF9IrMTjCC

---

Title: Information Theoretic Guarantees For Policy Alignment In Large Language Models

Authors: Youssef Mroueh, Apoorva Nitsure

Abstract: Policy alignment of large language models refers to constrained policy optimization, where the policy is optimized to maximize a reward while staying close to a reference policy based on an $f$-divergence like $\mathsf{KL}$ divergence. The best of $n$ alignment policy selects the sample with the highest reward from $n$ independent samples. Recent work shows that the reward improvement of the aligned policy scales as $\sqrt{\mathsf{KL}}$, with an explicit bound on the $\mathsf{KL}$ for best of $n$ policies. We show that this $\sqrt{\mathsf{KL}}$ bound holds if the reference policy’s reward has sub-gaussian tails. For best of $n$ policies, the $\mathsf{KL}$ bound applies to any $f$-divergence through a reduction to exponential order statistics using the Rényi representation. Tighter control can be achieved with Rényi divergence if additional tail information is known. Finally, we demonstrate how these bounds transfer to golden rewards, resulting in decreased golden reward improvement due to proxy reward overestimation and approximation errors.

URL: https://openreview.net/forum?id=Uz9J77Riul

---

Title: Augmented Invertible Koopman Autoencoder for long-term time series forecasting

Authors: Anthony Frion, Lucas Drumetz, Mauro Dalla Mura, Guillaume Tochon, Abdeldjalil AISSA EL BEY

Abstract: Following the introduction of Dynamic Mode Decomposition and its numerous extensions, many neural autoencoder-based implementations of the Koopman operator have recently been proposed. This class of methods appears to be of interest for modeling dynamical systems, either through direct long-term prediction of the evolution of the state or as a powerful embedding for downstream methods. In particular, a recent line of work has developed invertible Koopman autoencoders (IKAEs), which provide an exact reconstruction of the input state thanks to their analytically invertible encoder, based on coupling layer normalizing flow models. We identify that the conservation of the dimension imposed by the normalizing flows is a limitation for the IKAE models, and thus we propose to augment the latent state with a second, non-invertible encoder network. This results in our new model: the Augmented Invertible Koopman AutoEncoder (AIKAE). We demonstrate the relevance of the AIKAE through a series of long-term time series forecasting experiments, on satellite image time series as well as on a benchmark involving predictions based on a large lookback window of observations.

URL: https://openreview.net/forum?id=o6ukhJLzMQ

---

Title: Test-time Contrastive Concepts for Open-world Semantic Segmentation with Vision-Language Models

Authors: Monika Wysoczańska, Antonin Vobecky, Amaia Cardiel, Tomasz Trzcinski, Renaud Marlet, Andrei Bursuc, Oriane Siméoni

Abstract: Recent CLIP-like Vision-Language Models (VLMs), pre-trained on large amounts of image-text pairs to align both modalities with a simple contrastive objective, have paved the way to open-vocabulary semantic segmentation. Given an arbitrary set of textual queries, image pixels are assigned the closest query in feature space. However, this works well when a user exhaustively lists all possible visual concepts in an image that contrast against each other for the assignment. This corresponds to the current evaluation setup in the literature, which relies on having access to a list of in-domain relevant concepts, typically classes of a benchmark dataset. Here, we consider the more challenging (and realistic) scenario of segmenting a single concept, given a textual prompt and nothing else. To achieve good results, besides contrasting with the generic “background” text, we propose two different approaches to automatically generate, at test time, query-specific textual contrastive concepts. We do so by leveraging the distribution of texts in the VLM’s training set or crafted LLM prompts. We also propose a metric designed to evaluate this scenario and show the relevance of our approach on commonly used datasets.

URL: https://openreview.net/forum?id=wyOv4kGkbU

---

Title: Online Bandit Nonlinear Control with Dynamic Batch Length and Adaptive Learning Rate

Authors: Jihun Kim, Javad Lavaei

Abstract: This paper is concerned with the online bandit nonlinear control, which aims to learn the best stabilizing controller from a pool of stabilizing and destabilizing controllers of unknown types for a given nonlinear dynamical system. We develop an algorithm, named Dynamic Batch length and Adaptive learning Rate (DBAR), and study its stability and regret. Unlike the existing Exp3 algorithm requiring an exponentially stabilizing controller, DBAR only needs a significantly weaker notion of controller stability, in which case substantial time may be required to certify the system stability. Dynamic batch length in DBAR effectively addresses this issue and enables the system to attain asymptotic stability, where the algorithm behaves as if there were no destabilizing controllers. Moreover, adaptive learning rate in DBAR only uses the state norm information to achieve a tight regret bound even when none of the stabilizing controllers in the pool are exponentially stabilizing.

URL: https://openreview.net/forum?id=qmHlTkLdbL

---

Title: Tighter sparse variational Gaussian processes

Authors: Thang D Bui, Matthew Ashman, Richard E. Turner

Abstract: Sparse variational Gaussian process (GP) approximations based on inducing points have become the de facto standard for scaling GPs to large datasets, owing to their theoretical elegance, computational efficiency, and ease of implementation. This paper introduces a provably tighter variational approximation by relaxing the standard assumption that the conditional approximate posterior given the inducing points must match that in the prior. The key innovation is to modify the conditional posterior to have smaller variances than that of the prior at the training points. We derive the collapsed bound for the regression case, describe how to use the proposed approximation in large data settings, and discuss its application to handle orthogonally structured inducing points and GP latent variable models. Extensive experiments on regression benchmarks, classification, and latent variable models demonstrate that the proposed approximation consistently matches or outperforms standard sparse variational GPs while maintaining the same computational cost.

URL: https://openreview.net/forum?id=L33DSu3zvq

---

Title: Efficient Diffusion Models: A Survey

Authors: Hui Shen, Jingxuan Zhang, Boning Xiong, Rui Hu, Shoufa Chen, Zhongwei Wan, Xin Wang, Yu Zhang, Zixuan Gong, Guangyin Bao, Chaofan Tao, Yongfeng Huang, Ye Yuan, Mi Zhang

Abstract: Diffusion models have emerged as powerful generative models capable of producing high-quality contents such as images, videos, and audio, demonstrating their potential to revolutionize digital content creation. However, these capabilities come at the cost of significant computational resources and lengthy generation time, underscoring the critical need to develop efficient techniques for practical deployment. In this survey, we provide a systematic and comprehensive review of research on efficient diffusion models. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient diffusion model topics from algorithm-level, system-level, and framework perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at github.com/AIoT-MLSys-Lab/Efficient-Diffusion-Model-Survey. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient diffusion model research and inspire them to contribute to this important and exciting field.

URL: https://openreview.net/forum?id=wHECkBOwyt

---

Title: RESTOR: Knowledge Recovery in Machine Unlearning

Authors: Keivan Rezaei, Khyathi Chandu, Soheil Feizi, Yejin Choi, Faeze Brahman, Abhilasha Ravichander

Abstract: Large language models trained on web-scale corpora can memorize undesirable data containing misinformation, copyrighted material, or private or sensitive information.
Recently, several machine unlearning algorithms have been proposed to eliminate the effect of such datapoints from trained models-- that is, to approximate *a model that had never been trained on these datapoints in the first place*.
However, evaluating the effectiveness of unlearning algorithms remains an open challenge.
Previous work has relied on heuristics-- such as verifying that the model can no longer reproduce the specific information targeted for removal while maintaining accuracy on unrelated test data.
These approaches inadequately capture the complete effect of reversing the influence of datapoints on a trained model.
In this work, we propose the RESTOR framework for machine unlearning evaluation, which assesses the ability of unlearning algorithms for targeted data erasure, by evaluating the ability of models to forget the knowledge introduced in these datapoints, while simultaneously recovering the model's knowledge state had it never encountered these datapoints.
RESTOR helps uncover several novel insights about popular unlearning algorithms,
and the mechanisms through which they operate--
for instance, identifying that some algorithms merely emphasize forgetting but not recovering knowledge,
and that localizing unlearning targets can enhance unlearning performance.

URL: https://openreview.net/forum?id=BbwlJpNXgW

---

Title: MarDini: Masked Auto-regressive Diffusion for Video Generation at Scale

Authors: Haozhe Liu, Shikun Liu, Zijian Zhou, Mengmeng Xu, Yanping Xie, Xiao Han, Juan Camilo Perez, Ding Liu, Kumara Kahatapitiya, Menglin Jia, Jui-Chieh Wu, Sen He, Tao Xiang, Jürgen Schmidhuber, Juan-Manuel Perez-Rua

Abstract: We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini’s MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.

URL: https://openreview.net/forum?id=fuOHI59rUW

---

Title: Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control

Authors: Zhuoqun Chen, Xiu Yuan, Tongzhou Mu, Hao Su

Abstract: Imitation learning is an efficient method for teaching robots a variety of tasks. Diffusion Policy, which uses a conditional denoising diffusion process to generate actions, has demonstrated superior performance, particularly in learning from multi-modal demonstrates. However, it relies on executing multiple actions predicted from the same inference step to retain performance and prevent mode bouncing, which limits its responsiveness, as actions are not conditioned on the most recent observations. To address this, we introduce Responsive Noise-Relaying Diffusion Policy (RNR-DP), which maintains a noise-relaying buffer with progressively increasing noise levels and employs a sequential denoising mechanism that generates immediate, noise-free actions at the head of the sequence, while appending noisy actions at the tail. This ensures that actions are responsive and conditioned on the latest observations, while maintaining motion consistency through the noise-relaying buffer. This design enables the handling of tasks requiring responsive control, and accelerates action generation by reusing denoising steps. Experiments on response-sensitive tasks demonstrate that, compared to Diffusion Policy, ours achieves 18% improvement in success rate. Further evaluation on regular tasks demonstrates that RNR-DP also exceeds the best acceleration method (DDIM) by 6.9% in success rate, highlighting its computational efficiency advantage in scenarios where responsiveness is less critical.

URL: https://openreview.net/forum?id=LLWJkR6gaI

---

Title: Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs

Authors: Pranav Maneriker, Aditya T. Vadlamani, Anutam Srinivasan, Yuntian He, Ali Payani, srinivasan parthasarathy

Abstract: Conformal prediction has become increasingly popular for quantifying the uncertainty associated with machine learning models. Recent work in graph uncertainty quantification has built upon this approach for conformal graph prediction. The nascent nature of these explorations has led to conflicting choices for implementations, baselines, and method evaluation. In this work, we analyze the design choices made in the literature and discuss the tradeoffs associated with existing methods. Building on the existing implementations for existing methods, we introduce techniques to scale existing methods to large-scale graph datasets without sacrificing performance. Our theoretical and empirical results justify our recommendations for future scholarship in graph conformal prediction.

URL: https://openreview.net/forum?id=Ed1DBB3sBQ

---

Title: Offset Unlearning for Large Language Models

Authors: James Y. Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, Muhao Chen

Abstract: Despite the strong capabilities of Large Language Models (LLMs) to acquire knowledge from their training corpora, the memorization of sensitive information in the corpora such as copyrighted, biased, and private content has led to ethical and legal concerns. In response to these challenges, unlearning has emerged as a potential remedy for LLMs affected by problematic training data. However, previous unlearning techniques are either not applicable to black-box LLMs due to required access to model internal weights, or violate data protection principles by retaining sensitive data for inference-time correction. We propose $\delta$-unlearning, an offset unlearning framework for black-box LLMs. Instead of tuning the black-box LLM itself, $\delta$-unlearning learns the logit offset needed for unlearning by contrasting the logits from a pair of smaller models. Experiments demonstrate that $\delta$-unlearning can effectively unlearn target data while maintaining similar or even stronger performance on general out-of-forget-scope tasks. $\delta$-unlearning also effectively incorporates different unlearning algorithms, making our approach a versatile solution to adapting various existing unlearning algorithms to black-box LLMs.

URL: https://openreview.net/forum?id=A4RLpHPXCu

---

Title: Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation

Authors: Praveen Srinivasa Varadhan, amogh gulati, Ashwin Sankar, Srija Anand, Anirudh Gupta, Anirudh Mukherjee, Shiva Kumar Marepally, Ankur Bhatia, Saloni Jaju, Suvrat Bhooshan, Mitesh M Khapra

Abstract: Despite rapid advancements in TTS models, a consistent and robust human evaluation framework is still lacking. For example, MOS tests fail to differentiate between similar models, and CMOS's pairwise comparisons are time-intensive. The MUSHRA test is a promising alternative for evaluating multiple TTS systems simultaneously, but in this work we show that its reliance on matching human reference speech unduly penalises the scores of modern TTS systems that can exceed human speech quality. More specifically, we conduct a comprehensive assessment of the MUSHRA test, focusing on its sensitivity to factors such as rater variability, listener fatigue, and reference bias. Based on our extensive evaluation involving 492 human listeners across Hindi and Tamil we identify two primary shortcomings: (i) reference-matching bias, where raters are unduly influenced by the human reference, and (ii) judgement ambiguity, arising from a lack of clear fine-grained guidelines. To address these issues, we propose two refined variants of the MUSHRA test. The first variant enables fairer ratings for synthesized samples that surpass human reference quality. The second variant reduces ambiguity, as indicated by the relatively lower variance across raters. By combining these approaches, we achieve both more reliable and more fine-grained assessments. We also release MANGO, a massive dataset of 246,000 human ratings, the first-of-its-kind collection for Indian languages, aiding in analyzing human preferences and developing automatic metrics for evaluating TTS systems.

URL: https://openreview.net/forum?id=oYmRiWCQ1W

---

Title: Part-aware Prompted Segment Anything Model for Adaptive Segmentation

Authors: Chenhui Zhao, Liyue Shen

Abstract: Precision medicine, such as patient-adaptive treatments assisted by medical image analysis, poses new challenges for image segmentation algorithms due to the large variability across different patients and the limited availability of annotated data for each patient. In this work, we propose a data-efficient segmentation method to address these challenges, namely $\textit{\textbf{P}art-aware}$ $\textit{\textbf{P}rompted}$ $\textit{\textbf{S}egment}$ $\textit{\textbf{A}nything}$ $\textit{\textbf{M}odel}$ ($\mathbf{{P}^{2}SAM}$). Without any model fine-tuning, $\text{P}^2\text{SAM}$ enables seamless adaptation to any new patients relying only on one-shot patient-specific data. We introduce a novel part-aware prompt mechanism to select multiple-point prompts based on part-level features of the one-shot data, which can be extensively integrated into different promptable segmentation models, such as SAM and SAM 2. To further promote the robustness of the part-aware prompt mechanism, we propose a distribution-guided retrieval approach to determine the optimal number of part-level features for a specific case. $\text{P}^2\text{SAM}$ improves the performance by $\texttt{+} 8.0\%$ and $\texttt{+} 2.0\%$ mean Dice score for two different patient-adaptive segmentation applications, respectively. In addition, $\text{P}^2\text{SAM}$ also exhibits impressive generalizability in other adaptive segmentation tasks in the natural image domain, $\textit{e.g.}$, $\texttt{+} 6.4\%$ mIoU within personalized object segmentation task. Code will be released upon acceptance.

URL: https://openreview.net/forum?id=cCQKwd5MFP

---

New submissions
===============

Title: On the Hardness of Computing Counterfactual and Semi-factual Explanations in XAI

Abstract: Providing clear explanations to the choices of machine learning models is essential for these models to be deployed in crucial applications. Counterfactual and semi-factual explanations have emerged as two mechanisms for providing users with insights into the outputs of their models. We provide an overview of the computational complexity results in the literature for generating these explanations, finding that in many cases, generating explanations is computationally hard. We further contribute our own inapproximability results showing that not only are explanations often hard to generate, under certain assumptions they are also hard to approximate. We discuss the implications of these complexity results for the XAI community and for policymakers seeking to regulate explanations in AI.

URL: https://openreview.net/forum?id=aELzBw0q1O

---

Title: A Hybrid Active Learning Regression Approach for Accelerating Annotation with Data Generation Constraints

Abstract: In numerous scientific scenarios, experimental samples are designed as multiple data groups based on their underlying structures, \emph{e.g.,} with 1000 samples in each group, where these samples share certain similarities but include systematic physicochemical variations. Then, a smaller number of samples (\emph{e.g.,} 10) are selected to be placed in the parallel synthesizer, under a lengthy process, to collect their properties for subsequent machine learning analysis. Active learning, a technique that selects the most informative samples for the model, could reduce the cost of such a lengthy procedure by achieving better model performance with fewer labelled samples. However, generic batch-mode active learning algorithms are designed for sampling from a single sample pool and thus lack the mechanism to accelerate concurrent experiment execution with multiple data groups in such scientific scenarios. This paper proposes an active learning approach for scientific data with inherent group information, integrating multiple-output quantile regression for uncertainty estimation and combining the diversity of data distribution as a hybrid query method. The proposed method improves the efficiency of concurrent experiments, and the experimental results demonstrate its effectiveness on a suite of material science tasks.

URL: https://openreview.net/forum?id=yqnsopU7Dz

---

Title: You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

Abstract: Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed. However, the inherent decay mechanism in linear attention presents challenges when applied to multi-dimensional sequence modeling tasks, such as image processing and multi-modal learning. In these scenarios, the utilization of sequential scanning to establish a global receptive field necessitates multiple scans for multi-dimensional data, thereby leading to inefficiencies. This paper identifies the inefficiency caused by a \enquote{multiplicative decay} linear recurrence and proposes an efficient alternative \enquote{additive decay} linear recurrence to avoid the issue, as it can handle multi-dimensional data within a single scan. We further develop an efficient multi-dimensional sequential modeling framework called LightNet based on the new recurrence. Moreover, we present two new multi-dimensional linear relative positional encoding methods, MD-TPE and MD-LRPE to enhance the model's ability to discern positional information in multi-dimensional scenarios. Our empirical evaluations across various tasks, including image classification, image generation, bidirectional language modeling, and autoregressive language modeling, demonstrate the efficacy of LightNet, showcasing its potential as a versatile and efficient solution for multi-dimensional sequential modeling.

URL: https://openreview.net/forum?id=XG9ngiTupe

---

Title: Uncertainty Quantification in Linear Regression With Mismatched Data

Abstract: The fundamental assumption in regression analysis that each response-predictor pair corresponds to the same observational unit is not always valid, especially with mismatched data. This paper presents a novel approach for uncertainty quantification in linear regression when data mismatch occurs. Using the generalized fiducial inference framework, we develop a method to generate fiducial samples for constructing confidence intervals and measuring uncertainty in key regression parameters. We establish the theoretical properties of our approach and demonstrate its practical effectiveness through empirical tests on both simulated and real datasets. To our knowledge, this is the first study to explore uncertainty quantification for mismatched data in linear regression.

URL: https://openreview.net/forum?id=YnlRnZTkoP

---

Title: Neural Networks as Universal Finite-State Machines: A Constructive ReLU Simulation Framework for NFAs

Abstract: We present a formal and constructive framework establishing the equivalence between nondeterministic finite automata (NFAs) and standard feedforward ReLU neural networks. By encoding automaton states as binary vectors and transitions as sparse linear layers, we show that ReLU activations simulate nondeterministic branching, subset construction, and $\epsilon$-closures in a mathematically precise manner. Our core theoretical results prove that a three-layer ReLU network of width $\mathcal{O}(n)$ can exactly recognize any regular language accepted by an $n$-state NFA—without recurrence, memory, or approximation. Furthermore, we show that gradient descent over structure-preserving networks preserves symbolic semantics and acceptance behavior. Extensive experiments across multiple validation tasks—including parallel path tracking, symbolic subset construction, $\epsilon$-closure convergence, acceptance classification, structural training invariants, and functional equivalence—achieve perfect or near-perfect empirical alignment with ground-truth automata. This work provides the first provably complete symbolic simulation of NFAs within standard deep learning architectures, uniting automata theory with neural computation through ReLU dynamics.

URL: https://openreview.net/forum?id=NpzURo5rxE

---

Title: Leveraging Fully-Observable Solutions for Improved Partially-Observable Offline Reinforcement Learning

Abstract: Offline reinforcement learning (RL) is a popular learning framework for control problems where online interactions with the environment are expensive, risky, or otherwise impractical.
Existing offline RL methods commonly assume full observability of the state, and therefore there is a lack of offline RL methods that are specialized for the more general case of partially-observable control.
To address this gap, we propose Cross-Observability Conservative Q-Learning (CO-CQL), an offline RL algorithm for partially-observable control that leverages fully-observable expert policies in an asymmetric learning setting.
To motivate the use of fully-observable experts for partially-observable control, we formalize Cross-Observability Optimality Ratio (COOR), a theoretical measure of cross-observability that quantifies the benefit of learning asymmetrically from a fully-observable expert, and Cross-Observability Approximation Ratio (COAR), an estimation of COOR computable from trained policies.
Our empirical evaluation on a wide variety of partially-observable challenges demonstrates that CO-CQL is able to exploit the guidance of fully-observable experts to outperform other state-of-the-art offline algorithms.

URL: https://openreview.net/forum?id=e9p4TDPy6A

---

Title: Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"

Abstract: As large language models (LLMs) are increasingly deployed in multi-agent systems for cooperative tasks, understanding their decision-making behavior in resource-sharing scenarios becomes critically important for AI safety and governance. This study provides a comprehensive reproducibility analysis and theoretical extension of Piatti et al $\texttt{GovSim}$ framework, which models cooperative behavior in multi-agent systems through the lens of common resource dilemmas. Beyond faithful reproduction of core claims regarding model-size dependencies and universalization principles, we contribute four theoretically-motivated extensions that advance our understanding of LLM cooperation dynamics: (1) $\textbf{Cross-architectural generalization}$, where we demonstrate that cooperative capabilities transfer across model families, with $\texttt{DeepSeek-V3}$ achieving performance parity with $\texttt{GPT-4-turbo}$ despite different architectural foundations; (2) $\textbf{Cross-linguistic behavioral consistency}$, revealing that cooperative behavior remains stable across languages, contradicting hypotheses about cultural linguistic biases affecting cooperation; (3) $\textbf{Loss aversion in resource framing}$, showing that negative resource framing (elimination of harmful resources) fundamentally alters agent behavior patterns, with models like $\texttt{GPT-4o-mini}$ succeeding in loss-framed scenarios while failing in gain-framed ones, a finding with significant implications for prompt engineering in cooperative AI systems; and (4) $\textbf{Heterogeneous influence dynamics}$, demonstrating that high-performing models can systematically elevate the cooperative behavior of weaker models through communication, enabling resource-efficient deployment strategies. These findings establish fundamental principles for deploying LLMs in cooperative multi-agent systems: cooperation emerges from model capability rather than cultural training biases, resource framing significantly impacts behavioral stability, and strategic model mixing can amplify system-wide performance. Our work provides essential guidance for practitioners designing AI systems where multiple agents must cooperate to achieve shared objectives, from autonomous economic systems to collaborative robotics.

URL: https://openreview.net/forum?id=5VNLVclWRH

---

Title: Permissive Information-Flow Analysis for Large Language Models

Abstract: Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. One promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, this approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the labels of the samples that were influential in generating the model output and to eliminate the labels of unnecessary inputs. We implement and investigate the effectiveness of two variations of this approach, based on (i) prompt-based retrieval augmentation, and (ii) a $k$-nearest-neighbors language model. We compare these with a baseline that uses introspection to predict the output label. Our experimental results in an LLM agent setting show that the permissive label propagator improves over the baseline in more than 85% of the cases, which underscores the practicality of our approach.

URL: https://openreview.net/forum?id=ufYRO8y3mr

---

Title: Data-Efficient Challenges in Visual Inductive Priors: A Retrospective

Abstract: Deep Learning requires large amounts of data to train models that work well. In data-deficient settings, performance can be degraded. We investigate which Deep Learning methods benefit training models in a data-deficient setting, by organizing the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" workshop series, featuring four editions of data-impaired challenges. These challenges address the problem of training deep learning models for computer vision tasks with limited data. Participants are limited to training models from scratch using a low number of training samples and are not allowed to use any form of transfer learning. We aim to stimulate the development of novel approaches that incorporate prior knowledge to improve the data efficiency of deep learning models. Successful challenge entries make use of large model ensembles that mix Transformers and CNNs, as well as heavy data augmentation. Novel prior knowledge-based methods contribute to success in some entries.

URL: https://openreview.net/forum?id=ReFs9KHDgp

---

Title: Adversarial Surrogate Risk Bounds for Binary Classification

Abstract: A central concern in classification is the vulnerability of machine learning models to adversarial attacks. Adversarial training is one of the most popular techniques for training robust classifiers, which involves minimizing an adversarial surrogate risk. Recent work characterized when a minimizing sequence of an adversarial surrogate risk is also a minimizing sequence of the adversarial classification risk for binary classification--- a property known as \emph{adversarial consistency}. However, these results do not address the rate at which the adversarial classification risk converges to its optimal value for such a sequence of functions that minimize the adversarial surrogate. This paper provides surrogate risk bounds that quantify that convergence rate. Additionally, we derive distribution-dependent surrogate risk bounds in the standard (non-adversarial) learning setting, that may be of independent interest.

URL: https://openreview.net/forum?id=Bay1cHLk7h

---

Title: SPONGE: Competing Sparse Language Representations for Effective Knowledge Transfer

Abstract: In domains with privacy constraints, most knowledge resides in siloed datasets, hindering the development of a model with all relevant knowledge for a task.
Clinical NLP is a prime example of these constraints in practice.
Research in this area typically falls back to the canonical setting of sequential transfer learning, where a model pre-trained on large corpora is finetuned on a smaller annotated dataset.
An avenue for knowledge transfer among diverse clinics is multi-step sequential transfer learning since models are more likely to be shared than private clinical data.
This setting poses challenges of cross-linguality, domain diversity, and varying label distributions which undermine generalisation.
We propose SPONGE, an efficient prototypical architecture that leverages competing sparse language representations.
These encompass distributed knowledge and create the necessary level of redundancy for effective transfer learning across multiple datasets.
We identify that prototypical classifiers are critically sensitive to label-recency bias which we mitigate with a novel strategy at inference time. SPONGE in combination with this strategy significantly boosts generalisation performance to unseen data.
With the help of medical professionals, we show that the explainability of our models is clinically relevant.
We make all source code available.

URL: https://openreview.net/forum?id=OevFdPgk3h

---

Title: Unveiling Multiple Descents in Unsupervised Autoencoders

Abstract: The phenomenon of double descent has challenged the traditional bias-variance trade-off in supervised learning but remains unexplored in unsupervised learning, with some studies arguing for its absence. In this study, we first demonstrate analytically that double descent does not occur in linear unsupervised autoencoders (AEs). In contrast, we show for the first time that both double and triple descent can be observed with nonlinear AEs across various data models and architectural designs. We examine the effects of partial sample and feature noise and highlight the importance of bottleneck size in influencing the double descent curve. Through extensive experiments on both synthetic and real datasets, we uncover model-wise, epoch-wise, and sample-wise double descent across several data types and architectures. Our findings indicate that over-parameterized models not only improve reconstruction but also enhance performance in downstream tasks such as anomaly detection and domain adaptation, highlighting their practical value in complex real-world scenarios.

URL: https://openreview.net/forum?id=FqfHDs6unx

---

Title: LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

Abstract: With the rapid rise of large language models (LLMs), phone automation has undergone transformative changes. This paper systematically reviews LLM-driven phone GUI agents, highlighting their evolution from script-based automation to intelligent, adaptive systems. We first contextualize key challenges, (i) limited generality, (ii) high maintenance overhead, and (iii) weak intent comprehension, and show how LLMs address these issues through advanced language understanding, multimodal perception, and robust decision-making. We then propose a taxonomy covering fundamental agent frameworks (single-agent, multi-agent, plan-then-act), modeling approaches (prompt engineering, training-based), and essential datasets and benchmarks. Furthermore, we detail task-specific architectures, supervised fine-tuning, and reinforcement learning strategies that bridge user intent and GUI operations. Finally, we discuss open challenges such as dataset diversity, on-device deployment efficiency, user-centric adaptation, and security concerns, offering forward-looking insights into this rapidly evolving field. By providing a structured overview and identifying pressing research gaps, this paper serves as a definitive reference for researchers and practitioners seeking to harness LLMs in designing scalable, user-friendly phone GUI agents.

URL: https://openreview.net/forum?id=yWQqoi1G1K

---

Title: Identifying key amino acid types that distinguish paralogous proteins using Shapley value based feature subset selection

Abstract: Paralogous proteins have a common ancestor but have diverged in functionality. Using known machine learning algorithms, we present a data-driven method to identify the key amino acid types that play a role in distinguishing a given pair of proteins that are paralogs. We use an existing Shapley value based feature subset selection algorithm, SVEA, to identify the key amino acid types adequate to distinguish pairs of paralogous proteins. We refer to these as the amino acid feature subset ($AFS$). For a paralog pair, say proteins $P$ and $Q$, its $AFS$ is partitioned based on protein-wise importance as $AFS(P)$ and $AFS(Q)$ using a linear classifier, SVM. To validate the significance of the $AFS$ amino acids, we use multiple domain knowledge based methods : (a) multiple sequence alignment, and/or (b) 3D structure analysis, and/or (c) supporting evidence from biology literature. This method is computationally cheap, requires less data and can be used as an initial data-driven step for further hypothesis-driven experimental study of proteins. We demonstrate the results for 15 pairs of paralogous proteins.

URL: https://openreview.net/forum?id=CmgAHWKXNT

---

Title: Stochastic Primal-Dual Double Block-Coordinate for Two- way Partial AUC Maximization

Abstract: Two-way partial AUC (TPAUC) is a critical performance metric for binary classification with imbalanced data, as it focuses on specific ranges of the true positive rate (TPR) and false positive rate (FPR). However, stochastic algorithms for TPAUC optimization remain under-explored, with existing methods either limited to approximated TPAUC loss functions or burdened by sub-optimal complexities. To overcome these limitations, we introduce two innovative stochastic primal-dual double block-coordinate algorithms for TPAUC maximization. These algorithms utilize stochastic block-coordinate updates for both the primal and dual variables, catering to both convex and non-convex settings. We provide theoretical convergence rate analyses, demonstrating significant improvements over prior approaches. Our experimental results, based on multiple benchmark datasets, validate the superior performance of our algorithms, showcasing faster convergence and better generalization. This work advances the state of the art in TPAUC optimization and offers practical tools for real-world machine learning applications.

URL: https://openreview.net/forum?id=M3kibBFP4q

---

Title: Sparsity-Driven Plasticity in Multi-Task Reinforcement Learning

Abstract: Plasticity loss, a diminishing capacity to adapt as training progresses, is a critical challenge in deep reinforcement learning. We examine this issue in multi-task reinforcement learning (MTRL), where higher representational flexibility is crucial for managing diverse and potentially conflicting task demands. This paper specifically explores gradual magnitude pruning as a mechanism to enhance plasticity and consequently improve performance in MTRL agents. We systematically evaluate this approach across distinct MTRL architectures, including shared backbones with task-specific heads, Mixture of Experts (MoE), and Mixture of Orthogonal Experts (MOORE) on standardized MiniGrid benchmarks, comparing against dense baselines and alternative plasticity-inducing techniques. Our results demonstrate that pruning effectively mitigates key indicators of plasticity degradation, such as neuron dormancy and representational collapse. Consequently, these plasticity improvements from pruning directly correlate with enhanced multi-task performance, with sparse agents often outperforming both dense counterparts and alternative methods designed to induce plasticity. We further show that the benefits and specific dynamics induced by pruning are architecture-dependent, offering insights into the interplay between plasticity, network sparsity, and specific MTRL designs.

URL: https://openreview.net/forum?id=9L4Z23EfE9

---

Title: The Overcooked Generalisation Challenge: Evaluating Cooperation with Novel Partners in Unknown Environments Using Unsupervised Environment Design

Abstract: We introduce the Overcooked Generalisation Challenge (OGC) – a new benchmark for evaluating reinforcement learning (RL) agents on their ability to cooperate with unknown partners in unfamiliar environments. Existing work typically evaluated cooperative RL only in their training environment or with their training partners, thus seriously limiting our ability to understand agents’ generalisation capacity – an essential requirement for future collaboration with humans. The OGC extends Overcooked-AI to support dual curriculum design (DCD). It is fully GPU-accelerated, open-source, and integrated into the minimax DCD benchmark suite. Compared to prior DCD benchmarks, where designers manipulate only minimal elements of the environment, OGC introduces a significantly richer design space: full kitchen layouts with multiple objects that require the designer to account for interaction dynamics between agents. We evaluate state-of-the-art DCD algorithms alongside scalable neural architectures and find that current methods fail to produce agents that generalise effectively to novel layouts and unfamiliar partners. Our results indicate that both agents and curriculum designers struggle with the joint challenge of partner and environment generalisation. These findings establish OGC as a demanding testbed for cooperative generalisation and highlight key directions for future research.

URL: https://openreview.net/forum?id=K2KtcMlW6j

---

Title: On the Convergence of SVGD in KL divergence via Approximate gradient flow

Abstract: This study investigates the convergence of Stein variational gradient descent (SVGD), which is used to approximate a target distribution based on a gradient flow on the space of probability distributions. The existing studies mainly focus on the convergence in the kernel Stein discrepancy, which doesn't imply weak convergence in many practical settings. To address this issue, we propose to introduce a novel analytical approach called $(\epsilon,\delta)$-approximate gradient flow, extending conventional concepts of approximation error for the Wasserstein gradient. With this approach, we show the sub-linear convergence of SVGD in Kullback--Leibler divergence under the discrete time and infinite particle settings. Finally, we validate our theoretical findings through several numerical experiments.

URL: https://openreview.net/forum?id=AG1zXt5aoA

---

Title: SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning

Abstract: Modern deep reinforcement learning (DRL) methods have made significant advances in handling continuous action spaces. However, real-world control systems--especially those requiring precise and reliable performance--often demand interpretability in the sense of a-priori assessments of agent behavior to identify safe or failure-prone interactions with environments. To address this limitation, we propose SALSA-RL (Stability Analysis in the Latent Space of Actions), a novel RL framework that models control actions as dynamic, time-dependent variables evolving within a latent space. By employing a pre-trained encoder-decoder and a state-dependent linear system, our approach enables interpretability through local stability analysis, where instantaneous growth in action-norms can be predicted before their execution. We demonstrate that SALSA-RL can be deployed in a non-invasive manner for assessing the local stability of actions from pretrained RL agents without compromising on performance across diverse benchmark environments. By enabling a more interpretable analysis of action generation, SALSA-RL provides a powerful tool for advancing the design, analysis, and theoretical understanding of RL systems.

URL: https://openreview.net/forum?id=cdW8wUD0E8

---

Title: What Matters in Hierarchical Search for Solving Combinatorial Problems?

Abstract: Combinatorial problems, particularly the notorious NP-hard tasks, remain a significant challenge for AI research. A common approach to addressing them combines search with heuristics learned from demonstrations. Recently, hierarchical planning has emerged as a powerful framework in this context, enabling agents to decompose complex problems into manageable subgoals. However, the foundations of this approach, particularly the behavior and limitations of learned heuristics, remain underexplored. Our goal is to advance research in this area and establish a solid conceptual and empirical foundation.
Specifically, we identify the following key characteristics, whose presence favors the choice of hierarchical search methods: hard-to-learn value functions, complex action spaces, presence of dead ends in the environment, or training data collected from diverse sources. Through in-depth empirical analysis, we establish that hierarchical search methods consistently outperform standard search methods across these dimensions, and we formulate insights for future research. On the practical side, we also propose a set of evaluation guidelines to enable meaningful comparisons between methods and reassess the state-of-the-art algorithms.

URL: https://openreview.net/forum?id=zBF44jpV41

---

Title: $\textit{VIA}$: Unified Spatiotemporal $\underline{Vi}$deo $\underline{A}$daptation for Global and Local Video Editing

Abstract: Video editing serves as a fundamental pillar of digital media, spanning applications in entertainment, education, and professional communication.
However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistent edits in the spatiotemporal dimension, especially for long videos.
In this paper, we introduce $\textit{VIA}$, a unified spatiotemporal $\underline{VI}$deo $\underline{A}$daptation framework for global and local video editing, pushing the limits of consistently editing minute-long videos.
First, to ensure local consistency within individual frames, we designed \emph{test-time editing adaptation} to adapt a pre-trained image editing model for improving consistency between potential editing directions and the text instruction, and adapt masked latent variables for precise local control.
Furthermore, to maintain global consistency over the video sequence, we introduce \emph{spatiotemporal adaptation} that recursively \textbf{gathers} consistent attention variables in key frames and strategically applies them across the whole sequence to realize the editing effects.
Extensive experiments demonstrate that, compared to baseline methods, our $\textit{VIA}$ approach produces edits that are more faithful to the source videos, more coherent in the spatiotemporal context, and more precise in local control. More importantly, we show that $\textit{VIA}$ can achieve consistent long video editing in minutes, unlocking the potential for advanced video editing tasks over long video sequences.

URL: https://openreview.net/forum?id=qny1BqVEZ3

---

Title: Enhancing Diversity in Text-to-Image Generation without Compromising Fidelity

Abstract: Effective text-to-image generation must synthesize images that are both realistic in appearance (sample fidelity) and have sufficient variations (sample diversity). Diffusion models have achieved promising results in generating high-fidelity images based on textual prompts, and recently, several diversity-focused works have been proposed to improve their demographic diversity by enforcing the generation of samples from various demographic groups. However, another essential aspect of diversity, sample diversity---which enhances prompt reusability to generate creative samples that reflect real-world variability---has been largely overlooked. Specifically, how to generate images that have sufficient demographic and sample diversity while preserving sample fidelity remains an open problem because increasing diversity comes at the cost of reduced fidelity in existing works. To address this problem, we first propose a bimodal low-rank adaptation of pretrained diffusion models, which decouples the text-to-image conditioning, and then propose a lightweight bimodal guidance method that introduces additional diversity to the generation process using reference images retrieved through a fairness strategy by separately controlling the strength of text and image conditioning. We conduct extensive experiments to demonstrate the effectiveness of our method in enhancing demographic diversity (Intersectional Diversity~\citep{FairRAG}) by 2.47× and sample diversity (Recall~\citep{precision_recall}) by 1.45× while preserving sample fidelity (Precision~\citep{precision_recall}) compared to the baseline diffusion model.

URL: https://openreview.net/forum?id=180S4tOpmx

---

Title: Compressed Over-parameterized Federated Learning for Multiple Access Channels

Abstract: Federated Learning (FL) is a distributed machine learning (ML) paradigm that addresses user data privacy concerns. Here, a global ML model is learned by aggregating local models that were learned over local data at each edge user (also known as a client). Realizing the benefits of FL is challenging, particularly in communication-constrained environments, such as the Internet of Things (IoT) framework and wireless communication characterized by low bandwidth links over wireless physical channels. A well-known FL protocol over such resource-constrained channels is FL over multiple access channels, also known as FL-MAC, where edge users use the transmission medium simultaneously, hence avoiding the need for orthogonal resources. However, the communication bottleneck at the server in FL can still get choked since modern-day neural networks (NNs) are over-parameterized. Over-parameterized neural networks (ONNs) are trained in the lazy training regime, where the model weights of the NN change very slowly across gradient descent epochs. This motivates the use of incremental model weights. Since such updates are highly sparse, this allows for algorithms that employ compressive sensing (CS), thus allowing compressed model update communication. Accordingly, we propose Compressed Over-parameterized Federated Learning over MAC (or COFL-MAC). We employ a common Gaussian sensing matrix as the dictionary to compress the per-user model updates. By means of NTK theory, we show that the COFL-MAC framework exhibits exponential convergence in addition to being communication efficient. Using the CIFAR-10 and FMNIST datasets, we empirically demonstrate that the proposed framework outperforms the gradient compression benchmark strategies - Top-k (with correction), SignSGD, and MQAT, in terms of communication efficiency for a given test accuracy for different data heterogeneity levels among the clients.

URL: https://openreview.net/forum?id=jnBrypiy7R

---

Title: $\texttt{GCFed}$: Exploiting Gradient Correlation for Client Selection and Rate Allocation in Federated Learning

Abstract: Federated Learning (FL) has gained increasing popularity for its ability to harness diverse datasets from multiple sources without the need for data centralization. Extensive research has focused on reducing the cost of communications between remote clients and the parameter server. However, existing works fail to comprehensively leverage the correlation among the gradients at the remote clients. In this work, we propose $\texttt{GCFed}$ -- a novel FL framework that exploits the clients' gradient correlation to reduce communication costs while maintaining satisfactory convergence. Specifically, we propose an information-theoretic problem formulation that considers the model update problem in a single FL iteration as a multi-terminal source coding problem in the context of rate-distortion theory. We solve the associated optimization problem using convex semidefinite relaxation techniques with an iterative algorithm and leverage the solution to develop a joint approach for correlation-aware client selection and rate allocation. Extensive experiments are conducted to validate the effectiveness of our proposed framework and approaches as compared to state-of-the-art methods. Our code is available at: https://anonymous.4open.science/r/GCFed-D03B

URL: https://openreview.net/forum?id=XZQOYk7Rq8

---

Title: Uncertainty Quantification in SVM prediction

Abstract: This paper explores Uncertainty Quantification (UQ) in SVM predictions, particularly for regression and forecasting tasks. Unlike the Neural Network, the SVM solutions are typically more stable, sparse, optimal and interpretable. However, there are only few literature which addresses the UQ in SVM prediction. At first, we provide a comprehensive summary of existing Prediction Interval (PI) estimation and probabilistic forecasting methods developed in the SVM framework and evaluate them against the key properties expected from an ideal PI model. We find that none of the existing SVM PI models achieves a sparse solution, which has remained a key advantage of the standard SVM model developed for classification and regression tasks. To introduce sparsity in SVM model, we propose the Sparse Support Vector Quantile Regression (SSVQR) model, which constructs PIs and probabilistic forecasts by solving a pair of linear programs. Further, we develop a feature selection algorithm for PI estimation using SSVQR that effectively eliminates a significant number of features while improving PI quality in case of high-dimensional dataset. Finally we extend the SVM models in Conformal Regression setting for obtaining more stable prediction set with finite test set guarantees. Extensive experiments on artificial, real-world benchmark datasets compare the different characteristics of both existing and proposed SVM-based PI estimation methods and also highlight the advantages of the feature selection in PI estimation. Furthermore, we compare both, the existing and proposed SVM-based PI estimation models, with modern deep learning models for probabilistic forecasting tasks on benchmark datasets. Furthermore, SVM models show comparable or superior performance to modern complex deep learning models for probabilistic forecasting task in our experiments.

URL: https://openreview.net/forum?id=qR4Bo43Bzk

---

Title: DiffCLIP: Differential Attention Meets CLIP

Abstract: We propose DiffCLIP, a novel vision-language model that extends the differential attention mechanism to CLIP architectures. Differential attention was originally developed for large language models to amplify relevant context while canceling out noisy information. In this work, we integrate this mechanism into CLIP's dual encoder (image and text) framework. With minimal additional parameters, DiffCLIP achieves superior performance on image-text understanding tasks. Across zero-shot classification, retrieval, and robustness benchmarks, DiffCLIP consistently outperforms baseline CLIP models. Notably, these gains come with negligible computational overhead, demonstrating that differential attention can significantly enhance multi-modal representations without sacrificing efficiency.

URL: https://openreview.net/forum?id=2I2fTehry2

---

Title: Exploring Connections Between Memorization And Membership Inference

Abstract: Membership Inference Attacks (MIAs) aim to identify specific data samples within the private
training dataset of machine learning models. Many practical black-box MIAs require query
access to the data distribution to train shadow models. Prior literature presents bounds
for the adversary’s success by making connections to overfitting (and its connections to
differential privacy), noting that overfit models with high generalization error are more
susceptible to attacks. However, overfitting does not fully account for privacy risks in
models that generalize well. We take a complementary approach: by observing that label
memorization can be reduced to membership inference, we are able to present theoretical
scenarios where the adversary will always successfully (i.e., with extremely high advantage)
launch an MIA. We proceed to show that these attacks can be launched at a fraction of the
cost of state-of-the-art attacks. We confirm our theoretical arguments with comprehensive
experiments; by utilizing samples with high memorization scores, the adversary can (a)
significantly improve its efficacy regardless of the MIA used, and (b) reduce the number of
shadow models by nearly two orders of magnitude compared to state-of-the-art approaches.

URL: https://openreview.net/forum?id=zKMGIZA7EA

---

Title: A Proximal Operator for Inducing 2:4-Sparsity

Abstract: Recent hardware advancements in AI Accelerators and GPUs allow to efficiently compute sparse matrix multiplications, especially when 2 out of 4 consecutive weights are set to zero. However, this so-called 2:4 sparsity usually comes at a decreased accuracy of the model. We derive a regularizer that exploits the local correlation of features to find better sparsity masks in trained models. We minimize the regularizer jointly with a local squared loss by deriving the proximal operator for which we show that it has an efficient solution in the 2:4-sparse case. After optimizing the mask, we introduce masked-gradient updates to further minimize the local squared loss. We illustrate our method on toy problems and apply it to pruning entire large language models up to 70B parameters. On models up to 13B we improve over previous state of the art algorithms, whilst on 70B models we match their performance.

URL: https://openreview.net/forum?id=AsFbXRIe4q

---

Title: Statistical Test for Saliency Maps of Graph Neural Networks via Selective Inference

Abstract: Graph Neural Networks (GNNs) have gained prominence for their ability to process graph-structured data across various domains. However, interpreting GNN decisions remains a significant challenge, leading to the adoption of saliency maps for identifying influential nodes and edges. Despite their utility, the reliability of GNN saliency maps has been questioned, particularly in terms of their robustness to noise. In this study, we propose a statistical testing framework to rigorously evaluate the significance of saliency maps. Our main contribution lies in addressing the inflation of the Type I error rate caused by double-dipping of data, leveraging the framework of Selective Inference. Our method provides statistically valid $p$-values while controlling the Type I error rate, ensuring that identified salient subgraphs contain meaningful information rather than random artifacts. To demonstrate the effectiveness of our method, we conduct experiments on both synthetic and real-world datasets, showing its effectiveness in assessing the reliability of GNN interpretations.

URL: https://openreview.net/forum?id=5NkXTCVa7F

---

Title: Risk-controlling Prediction with Distributionally Robust Optimization

Abstract: Conformal prediction is a popular paradigm to quantify the uncertainty of a model's output on a new batch of data. Quite differently, distributionally robust optimization aims at training a model that is robust to uncertainties in the distribution of the training data. In this paper, we examine the links between the two approaches. In particular, we show that we can learn conformal prediction intervals by distributionally robust optimization on a well chosen objective. This further entails to train a model and build conformal prediction intervals all at once, using the same data.

URL: https://openreview.net/forum?id=d9dl6DyJpJ

---

Title: Hierarchical Language Model Design For Interpretable Graph Reasoning

Abstract: Large language models (LLMs) are being increasingly explored for graph tasks. Despite their remarkable success in text-based tasks, LLMs' capabilities in understanding explicit graph structures remain limited, particularly with large graphs. In this work, we introduce Hierarchical Language Model for Graph (HLM-G), which employs a two-block architecture to capture node-centric local information and interaction-centric global structure, effectively enhancing graph structure understanding abilities. The proposed scheme allows LLMs to address various graph queries with high efficacy, efficiency, and robustness, while reducing computational costs on large-scale graph tasks. Furthermore, we demonstrate the interpretability of our model using intrinsic attention weights and established explainers.
Comprehensive evaluations across diverse graph reasoning and real-world tasks of node, link, and graph-levels highlight the superiority of our method, marking a significant advancement in the application of LLMs to graph understanding.

URL: https://openreview.net/forum?id=F74rZKJXfm

---

Title: FLASC: Federated LoRA with Sparse Communication

Abstract: Low-rank adaptation (LoRA) is a promising method for finetuning models in communication-constrained settings such as cross-device federated learning (FL). Prior work has explored ways to improve the efficiency of LoRA in federated settings by imposing additional sparsity constraints. However, as we show, existing methods for sparse LoRA not only harm accuracy but can in fact increase overall communication costs. We instead propose FLASC, a simple approach with two key components: First, FLASC combines LoRA with sparse communication, which outperforms baselines such as using a lower LoRA rank or pruning LoRA weights. Second, FLASC-Search efficiently searches the space of sparsity-and-rank configurations by iteratively comparing pairs of configurations and increasing either the rank or density. Across four FL datasets, we demonstrate that FLASC outperforms existing sparse LoRA methods with up to 20% higher accuracy or 10x less communication. Our work highlights the importance of considering the constraints of existing efficient finetuning methods and provides a simple and competitive baseline for future work in federated finetuning.

URL: https://openreview.net/forum?id=N3Aav8S7k1

---

Title: GeMS: Efficient Gaussian Splatting for Extreme Motion Blur

Abstract: We introduce GeMS, a framework for 3D Gaussian Splatting designed to handle severely
motion-blurred images. State-of-the-art deblurring method for extreme motion blur, such as
ExBluRF, as well as Gaussian Splatting-based approaches like Deblur-GS, typically assume
access to corresponding sharp images for camera pose estimation and point cloud generation,
which is an unrealistic assumption. Additionally, methods relying on COLMAP initialization,
such as BAD-Gaussians, fail due to the lack of reliable feature correspondences in cases
of severe motion blur. To address these challenges, we propose GeMS, a 3D Gaussian Splat-
ting framework that reconstructs scenes directly from extremely motion-blurred images.
GeMS integrates: (1) VGGSfM, a deep learning-based SfM pipeline which estimates camera
poses and generates point clouds directly from severely motion-blurred images; (2) MCMC -
based Gaussian Splatting, which enables robust scene initialization by treating Gaussians as
samples from an underlying probability distribution, eliminating heuristic densification and
pruning strategies; and (3) Joint optimization of camera motion trajectory and Gaussian
parameters which ensures stable and accurate reconstruction. While this pipeline produces
reasonable reconstructions, extreme motion blur can still introduce inaccuracies, especially
when all input views are severely blurred. To address this, we propose GeMS-E, which integrates
a progressive refinement step when event data is available. Specifically, we perform
(4) Event-based Double Integral (EDI) deblurring, which first restores deblurred images from
motion-blurred inputs. These deblurred images are then fed into the GeMS framework, lead-
ing to improved pose estimation, point cloud generation, and hence overall reconstruction
quality. Both GeMS & GeMS-E achieve state-of-the-art performance on synthetic as well
as real-world datasets, demonstrating their effectiveness in handling extreme motion blur.
To the best of our knowledge, we are the first to effectively address this problem in extreme
blur scenarios within a 3D Gaussian Splatting framework, without requiring sharp images
for SfM (pose and point cloud) initialization.

URL: https://openreview.net/forum?id=BDjnnr8qGE

---

Title: Improving Adversarial Training for Two-player Competitive Games via Episodic Reward Engineering

Abstract: Training adversarial agents to attack neural network policies has proven to be both effective and practical. However, we observe that existing methods can be further enhanced by distinguishing between states leading to win or lose and encouraging the policy training by reward engineering to prioritize winning states. In this paper, we introduce a novel adversarial training method with reward engineering for two-player competitive games. Our method extracts the historical evaluations for states from historical experiences with an episodic memory, and then incorporating these evaluations into the rewards with our proposed reward revision method to improve the adversarial policy optimization. We evaluate our approach using two-player competitive games in MuJoCo simulation environments, demonstrating that our method establishes the most promising attack performance and defense difficulty against the victims among the existing adversarial policy training techniques.

URL: https://openreview.net/forum?id=hbQ5mDi64p

---

Title: Exploiting Space Folding by Neural Networks

Abstract: Recent findings suggest that consecutive layers of neural networks with the ReLU activation function \emph{fold} the input space during the learning process. While many works hint at this phenomenon, an approach to quantify the folding was only recently proposed by means of a space folding measure based on the Hamming distance in the ReLU activation space. Moreover, it has been observed that space folding values increase with network depth when the generalization error is low, but decrease when the error increases, thus underpinning that learned symmetries in the data manifold (visible in terms of space folds) contribute to the network's generalization capacity. Inspired by these findings, we propose a novel regularization scheme that enforces folding early during the training process. Further, we generalize the space folding measure to a wider class of activation functions through the introduction of equivalence classes of input data. We then analyze its mathematical and computational properties and propose an efficient sampling strategy for its implementation. Lastly, we outline the connection between learning with increased folding and contrastive learning, hinting that the former is a generalization of the latter. We underpin our claims with an experimental evaluation.

URL: https://openreview.net/forum?id=4SSMeu0Zyo

---

Title: Efficient Class-Incremental Segmentation Learning via Expanding Visual Transformers

Abstract: Incrementally learning new semantic concepts while retaining existing information is fundamental for several real-world applications. While behaviors of different sizes of backbones and architectural choices have been studied to propose efficient limited-sized architectures within many non-incremental computer vision applications, only large convolutional and Visual Transformer (ViT) backbones have been explored for class-incremental semantic segmentation, without providing a fair comparison wrt model size. In this work, we propose a fair study across existing class-incremental semantic segmentation methods, focusing on the models efficiency wrt their memory footprint. Moreover, we propose TILES (Transformer-based Incremental Learning for Expanding Segmenter), a novel approach exploiting small-size ViT backbones efficiency to offer an alternative solution where severe memory constraints are applied. It is based on expanding the architecture with the increments, allowing to learn new tasks while retaining old knowledge within a limited memory footprint. Besides, in order to tackle the background semantic shift, we apply adaptive losses specific to the incremental branches, while balancing old and new knowledge. Furthermore, we exploit the confidence of each incremental task to propose an efficient branch merging strategy. TILES provides state-of-the-art results on challenging benchmarks using up to $14$ times fewer parameters.

URL: https://openreview.net/forum?id=9lBWcsOIcz

---

Title: FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models

Abstract: Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server, while being respectful of privacy. A critical bottleneck in FL is the communication cost. A pivotal strategy to mitigate this burden is Local Training, which involves running multiple local stochastic gradient descent iterations between communication phases. Our work is inspired by the innovative Scaffnew algorithm, which has considerably advanced the reduction of communication complexity in FL. We introduce FedComLoc (Federated Compressed and Local Training), integrating practical and effective compression into Scaffnew to further enhance communication efficiency. Extensive experiments, using the popular Top-K compressor and quantization, demonstrate its prowess in substantially reducing communication overheads in heterogeneous settings.

URL: https://openreview.net/forum?id=vYQPLytQsj

---

Title: CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives

Abstract: Grounding the instruction in the environment is a key step in solving language-guided goal-reaching reinforcement learning problems. In automated reinforcement learning, a key concern is to enhance the model's ability to generalize across various tasks and environments. In goal-reaching scenarios, the agent must comprehend the different parts of the instructions within the environmental context in order to complete the overall task successfully. In this work, we propose \textbf{CAREL} (\textit{\textbf{C}ross-modal \textbf{A}uxiliary \textbf{RE}inforcement \textbf{L}earning}) as a new framework to solve this problem using auxiliary loss functions inspired by video-text retrieval literature and a novel method called instruction tracking, which automatically keeps track of progress in an environment. The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems.

URL: https://openreview.net/forum?id=zJUEYr5X1X

---

Title: Iterative Improvements Based on Ground Truth: Building LLM Agents in the Era of Experience Inspired by Games AI

Abstract: LLM agents have attracted much attention recently.
However, how to build successful LLM agents, esp. w.r.t. autonomy and optimality, is still an open problem.
We present a perspective paper with a brief survey about building LLM agents with iterative improvements based on ground truth, in the era of experience inspired by the successes of games AI.
We propose AgentZero, Agent$\mu$, and Agent$\infty$, agent frameworks with perfect, learned and no world models, following AlphaZero, MuZero and a model-free method like DQN, respectively.
We propose to leverage domain knowledge for data collection, architecture design and algorithm design, and propose decision time planning and meta reinforcement learning at both pre- and post-training stages.
We present case studies for building agents for games, maths, or coding, with approximate simulators, facts, and/or human-in-the-loop.

URL: https://openreview.net/forum?id=hcd3xkYlAu

---

Title: k-NN as a Simple and Effective Estimator of Transferability

Abstract: How well can one expect transfer learning to work in a new setting where the domain is shifted, the task is different, and the architecture changes? Many transfer learning metrics have been proposed to answer this question. But how accurate are their predictions in a realistic new setting? We conducted an extensive evaluation involving over 42,000 experiments comparing 23 transferability metrics across 16 different datasets to assess their ability to predict transfer performance. Our findings reveal that none of the existing metrics perform well across the board. However, we find that a simple k-nearest neighbor evaluation -- as is commonly used to evaluate feature quality for self-supervision -- not only surpasses existing metrics, but also offers better computational efficiency and ease of implementation.

URL: https://openreview.net/forum?id=hGlkjP1zHc

---

Title: Amdahl’s Law for LLMs: A Throughput-Centric Analysis of Extreme LLM Quantization

Abstract: The emergence of 1-bit large language models (LLMs) has sparked significant interest, promising substantial efficiency gains through extreme quantization. However, these benefits are inherently limited by the portion of the model that can be quantized. Specifically, 1-bit quantization typically targets only the projection layers, while the attention mechanisms remain in higher precision, potentially creating significant throughput bottlenecks. To address this, we present an adaptation of Amdahl's Law specifically tailored to the LLMs, offering a quantitative framework for understanding the throughput limits of extreme quantization. Our analysis reveals how improvements in quantization can deliver substantial throughput gains, but only to the extent that they address critical throughput-constrained sections of the model. Through extensive experiments across diverse model architectures and hardware platforms, we highlight key trade-offs and performance ceilings, providing a roadmap for future research aimed at maximizing LLM throughput through more holistic quantization strategies.

URL: https://openreview.net/forum?id=JtrQJJQYpP

---

Title: Beyond Marginals: Learning Joint Spatio-Temporal Patterns for Multivariate Anomaly Detection

Abstract: In this paper, we aim to improve anomaly detection (AD) by incorporating the time-varying non-linear spatio-temporal correlations of the multi-variate time series data in the modeling process. In multivariate AD, the simultaneous deviation of multiple nodes from their expected behavior can indicate an anomaly, even if no individual node shows a clearly abnormal pattern. In many existing approaches, time series variables are assumed to be (conditionally) independent, which oversimplifies real-world interactions. Our approach addresses this by modeling joint dependencies using a copula-based framework, which decouples the modeling of marginal distributions, temporal dynamics, and inter-variable dependencies. We use a transformer encoder to capture temporal patterns, and to model spatial (inter-variable) dependencies, we integrate a copula. Both components are trained jointly in a latent space using a self-supervised contrastive learning objective to learn meaningful feature representations to separate normal and anomaly samples.

URL: https://openreview.net/forum?id=iETTv1okjX

---

Title: Long-Term Fairness Without Utility Deterioration

Abstract: In fair machine learning, the trade-off between fairness and utility has been predominantly studied in static classification settings, neglecting concerns for long-term learning environments where the population distribution may vary due to the deployment of model policies. This work investigates whether zero utility deterioration can be achieved in the long run. We introduce a Markov decision process (MDP) to formulate the interplay between model decisions and population distribution shifts. A key technical contribution is identifying a sufficient and necessary condition under which a model policy achieving long-term fairness does not compromise utility. Inspired by this condition, we propose effective reward functions that can be combined with online reinforcement learning algorithms, allowing the classifier to accommodate dynamic control objectives such as inducing population adaptations to maximize fairness without sacrificing model performance. Experiments on both synthetic and real-world datasets suggest the effectiveness of the proposed reinforcement learning framework in the long run and drive a classifier-population system toward a desirable equilibrium where the identified condition is met.

URL: https://openreview.net/forum?id=BvuSeLV1Nc

---

Title: GMAgent: A Graph-oriented Multi-agent Collaboration Framework for Text-attributed Graph Analysis

Abstract: Text-Attributed Graphs (TAGs) are crucial for modeling interconnected data in numerous real-world applications. Graph Neural Networks (GNNs) excel at efficiently capturing global structural information across TAGs, while Large Language Models (LLMs) offer strong capabilities in local semantic understanding. Despite the recent development of integrating GNNs and LLMs for TAG analysis, these approaches often fail to fully exploit their complementary strengths by relying primarily on a single architecture. Furthermore, LLM-based multi-agent collaboration systems have shown promising potential across diverse fields. However, their integration with GNNs for graph analytical tasks remains underexplored. To this end, we introduce GMAgent, a novel graph-oriented multi-agent collaboration framework that effectively and flexibly interacts between diverse GNN-based and LLM-based graph agents, facilitating comprehensive TAG analysis. First, we deploy multiple GNNs as graph agents to perform conflict evaluation, identifying conflict scenarios for further multi-agent collaboration. Then, we repurpose LLMs as graph agents via graph-driven instruction tuning and adopt a role-play expert recruiting strategy, thereby generating LLM graph experts' initial analyses for conflict scenarios. Finally, we conduct a graph-oriented multi-agent collaboration to effectively and efficiently guide collaborative self-reflection on graph experts and the final answer selection. Extensive experimental results on five datasets demonstrate significant improvements, showcasing the potential of our GMAgent in improving the effectiveness, interoperability, and flexibility of comprehensive TAG analysis.

URL: https://openreview.net/forum?id=iBm79eP0ex

---

Title: Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

Abstract: Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefits, they are not easily applicable to modern deep learning. The major reason is due to the quadratic memory and cubic time complexity to compute the inverse of the matrix. These requirements are infeasible even with state-of-the-art hardware. In this work, we propose Ginger, an eigendecomposition for the inverse of the generalized Gauss-Newton matrix. Our method enjoys efficient linear memory and time complexity for each iteration. Instead of approximating the conditioning matrix, we directly maintain its inverse to make the approximation more accurate. We provide the convergence result of Ginger for non-convex objectives. Our experiments on different tasks with different model architectures verify the effectiveness of our method.

URL: https://openreview.net/forum?id=nSyi68OyEi

---

Title: Activation sharding for scalable training of large models

Abstract: Despite fast progress, efficiently training large language models (LLMs) in extremely long contexts remains challenging.
Existing methods fall back to training LLMs with short contexts (up to a few thousand tokens) and use inference time techniques when evaluating on very long contexts (above 1M tokens).
Training on very long contexts is limited by GPU memory availability and the prohibitively long training times it requires on state-of-the-art hardware.
Meanwhile, many real-life applications require training/fine-tuning with long context on specific tasks. Such applications include, for example, augmenting the context with various sources of raw reference information for extraction, summarization, or fact reconciliation tasks.
We propose adjoint sharding, a novel technique that comprises sharding gradient calculation during training to reduce memory requirements by orders of magnitude, making training on very long contexts computationally tractable. At the core of our adjoint sharding algorithm lies the adjoint method, which efficiently computes gradients that are provably equivalent to the gradients computed using standard backpropagation.
We also propose truncated adjoint sharding to accelerate the algorithm while maintaining performance.
We provide a distributed and a parallel-computing version of adjoint sharding to speed up training and to show that adjoint sharding is compatible with these standard memory-reduction techniques.
Empirical results show the proposed adjoint sharding algorithm reduces memory usage by up to 3$\times$ on a large language model with 1.27B parameters on 1M context length training. This reduction in memory usage allows increasing the maximum context length of training a 1.27B parameter model from 35K tokens to above 100K tokens on a training infrastructure composed of five AWS P4 instances.

URL: https://openreview.net/forum?id=kQCuMcEneq

---

Title: A Bias Correction Mechanism for Distributed Asynchronous Optimization

Abstract: We develop an asynchronous gradient method for training Machine Learning models with asynchronous distributed workers, each with its own communication and computation pace, and its own local data distribution. In the modern distributed machine learning training process, local data distribution across workers is often heterogeneous (a.k.a. client bias), which is a significant limiting factor in the analysis of most existing distributed asynchronous optimization methods. In this work, we propose AsyncBC, a distributed asynchronous variant of the SARAH method, and show that this is an effective Bias Correction mechanism for distributed asynchronous optimization. We show that AsyncBC can effectively manage arbitrary data heterogeneity, as well as handle gradient updates that arrive in an uncoordinated manner and with delays. As a byproduct of our analysis, we also provide a deeper understanding of the impacts of different stochasticity models on the convergence of the SARAH method.

URL: https://openreview.net/forum?id=8doMbaah0s

---

Title: Decentralized Federated Learning with Function Space Regularization

Abstract: In this work we propose FedFun, a novel framework for decentralized federated learning that enforces consensus across clients in function space rather than parameter space. By framing agreement as a regularization penalty in a Hilbert space of hypotheses, our method allows optimization using proximal gradient updates that encourage similarity between neighboring models while supporting both parametric and non-parametric learners. This function space perspective yields theoretical advantages, including broad convergence guarantees even when individual client objectives are non-convex in parameter space, and improved robustness to client heterogeneity. We provide convergence analysis under mild assumptions, demonstrate compatibility with models like neural networks and decision trees, and empirically evaluate implementations of FedFun on various sample datasets.

URL: https://openreview.net/forum?id=JPlm1i4sxb

---

Title: Single Step Policy Alignment for Imitation Learning with Auxiliary Imperfect Demonstration

Abstract: We propose a novel one-step supervised imitation learning (IL) framework called Adversarial Density Regression (ADR). This imitation learning (IL) framework seeks to utilize a single-step re-weighted behavioral cloning (BC) objective to rectify the policy acquired under conditions of unknown quality by aligning it with the expert distribution using demonstrations. Specifically, ADR is designed to address several limitations in previous IL algorithms: First, existing off-policy IL algorithms are based on the Bellman operator, which inevitably suffers from cumulative offsets from sub-optimal multi-step rewards. Additionally, these off-policy frameworks suffer from out-of-distribution~(OOD) state-actions. Second, the conservative terms that help solve the OOD issue require nuanced and delicate balancing. To address these limitations, we fully integrate a one-step density-weighted BC objective for IL with auxiliary imperfect demonstration. Theoretically, we demonstrate that this adaptation can effectively correct the distribution of policies trained on unknown-quality datasets to align with the expert policy's distribution. The difference between the empirical and the optimal value function is proportional to the upper bound of ADR's objective, indicating that minimizing ADR's objective is akin to approaching the optimal value. Empirically, we conduct extensive evaluations and find that ADR outperforms all of the selected IL algorithms on tasks from the Gym-Mujoco domain. Meanwhile, ADR achieves about \textbf{90\%} improvement over IQL when utilizing ground truth rewards on tasks from the Adroit and Kitchen domains.

URL: https://openreview.net/forum?id=zOJvvcrypU

---

Title: Does Unsupervised Domain Adaptation Improve the Robustness of Amortized Bayesian Inference? A Systematic Evaluation

Abstract: Neural networks are fragile when confronted with data that significantly deviates from their training distribution. This is true in particular for simulation-based inference methods, such as neural amortized Bayesian inference (ABI), where models trained on simulated data are deployed on noisy real-world observations. Recent robust approaches employ unsupervised domain adaptation (UDA) to match the embedding spaces of simulated and observed data. However, the lack of comprehensive evaluations across different domain mismatches raises concerns about the reliability in high-stakes applications. We address this gap by systematically testing UDA approaches across a wide range of misspecification scenarios in silico and practice. We demonstrate that aligning summary spaces between domains effectively mitigates the impact of unmodeled phenomena or noise. However, the same alignment mechanism can lead to failures under prior misspecifications - a critical finding with practical consequences. Our results underscore the need for careful consideration of misspecification types when using UDA to increase the robustness of ABI.

URL: https://openreview.net/forum?id=ewgLuvnEw6

---

Title: On Representing Convex Quadratically Constrained Quadratic Programs via Graph Neural Networks

Abstract: Convex quadratically constrained quadratic programs (QCQPs) involve finding a solution within a convex feasible region defined by quadratic constraints while minimizing a convex quadratic objective function. These problems arise in various industrial applications, including power systems and signal processing. Traditional methods for solving convex QCQPs primarily rely on matrix factorization, which quickly becomes computationally prohibitive as the problem size increases. Recently, graph neural networks (GNNs) have gained attention for their potential in representing and solving various optimization problems such as linear programs and linearly constrained quadratic programs. In this work, we investigate the representation power of GNNs in the context of QCQP tasks. Specifically, we propose a new tripartite graph representation for general convex QCQPs and properly associate it with message-passing GNNs. We demonstrate that there exist GNNs capable of reliably representing key properties of convex QCQPs, including feasibility, optimal value, and optimal solution. Our result deepens the understanding of the connection between QCQPs and GNNs, paving the way for future machine learning approaches to efficiently solve QCQPs.

URL: https://openreview.net/forum?id=GC2ZO6Asoa

---

Title: Learning Robust Representations for Visual Reinforcement Learning via Task-Relevant Mask Sampling

Abstract: Humans excel at isolating relevant information from noisy data to predict the behavior of dynamic systems, effectively disregarding non-informative, temporally-correlated noise. In contrast, existing visual reinforcement learning algorithms face challenges in generating noise-free predictions within high-dimensional, noise-saturated environments, especially when trained on world models featuring realistic background noise extracted from natural video streams. We propose Task Relevant Masks Sampling (TRMS), a novel approach for identifying task-specific and reward-relevant masks. TRMS utilizes existing segmentation models as a masking prior, which is subsequently followed by a mask selector that dynamically identifies subset of masks at each timestep, selecting those most probable to contribute to task-specific rewards. To mitigate the high computational cost associated with these masking priors, a lightweight student network is trained in parallel. This network learns to perform masking independently and replaces the Segment Anything Model~(SAM)-based teacher network after a brief initial phase (<10-25% of total training). TRMS enhances the generalization capabilities of Soft Actor-Critic agents under distractions, achieves better performance on the RL-Vigen benchmark, which includes challenging variants of the DeepMind Control Suite, Dexterous Manipulation and Quadruped Locomotion tasks.

URL: https://openreview.net/forum?id=2rxNDxHwtn

---

Title: A Mixture of Exemplars Approach for Efficient Out-of-Distribution Detection with Foundation Models

Abstract: One of the early weaknesses identified in deep neural networks trained for image classification tasks was their inability to provide low confidence predictions on out-of-distribution (OOD) data that was significantly different from the in-distribution (ID) data used to train them. Representation learning, where neural networks are trained in specific ways that improve their ability to detect OOD examples, has emerged as a promising solution. However, these approaches require long training times and can add additional overhead to detect OOD examples. Recent developments in Vision Transformer (ViT) foundation models---large networks trained on large and diverse datasets with self-supervised approaches---also show strong performance in OOD detection, and could address these challenges. This paper presents Mixture of Exemplars (MoLAR), an efficient approach to tackling OOD detection challenges that is designed to maximise the benefit of training a classifier with a high quality, frozen, pretrained foundation model backbone. MoLAR provides strong OOD performance when only comparing the similarity of OOD examples to the exemplars, a small set of images chosen to be representative of the dataset, leading to up to 30 times faster OOD detection inference over other methods that provide best performance when the full ID dataset is used. In some cases, only using these exemplars actually improves performance with MoLAR. Extensive experiments demonstrate the improved OOD detection performance of MoLAR in comparison to comparable approaches in both supervised and semi-supervised settings, and code is available at https://anonymous.4open.science/r/molar-mixture-of-exemplars-4872/README.md.

URL: https://openreview.net/forum?id=xpKqnSJtE4

---

Title: Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients

Abstract: Score-based Generative Models (SGMs) approximate a data distribution by perturbing it with Gaussian noise and subsequently denoising it via a learned reverse diffusion process. These models excel at modeling complex data distributions and generating diverse samples, achieving state-of-the-art performance across domains such as computer vision, audio generation, reinforcement learning, and computational biology. Despite their empirical success, existing Wasserstein-2 convergence analysis typically assume strong regularity conditions--such as smoothness or strict log-concavity of the data distribution--that are rarely satisfied in practice. In this work, we establish the first non-asymptotic Wasserstein-2 convergence guarantees for SGMs targeting semiconvex distributions with potentially discontinuous gradients. Our upper bounds are explicit and sharp in key parameters, achieving optimal dependence of $O(\sqrt{d})$ on the data dimension $d$ and convergence rate of order one. The framework accommodates a wide class of practically relevant distributions, including symmetric modified half-normal distributions, Gaussian mixtures, double-well potentials, and elastic net potentials. By leveraging semiconvexity without requiring smoothness assumptions on the potential such as differentiability, our results substantially broaden the theoretical foundations of SGMs, bridging the gap between empirical success and rigorous guarantees in non-smooth, complex data regimes.

URL: https://openreview.net/forum?id=vS9iVRB7XF

---

Title: Language-assisted Feature Representation and Lightweight Active Learning For On-the-Fly Category Discovery

Abstract: Contemporary deep learning models are very successful in recognizing predetermined categories, but often struggle when confronted with novel ones, constraining their utility in the real world. Identifying this research gap, On-the-fly Category Discovery aims to enable machine learning systems trained on closed labeled datasets to promptly discern between novel and familiar categories of the test-images encountered in an online manner (one image at a time), along with clustering the different new classes as and when they are encountered. To address this challenging task, we propose SynC, a pragmatic yet robust framework that capitalizes on the presence of category names within the labeled datasets and the powerful knowledge-base of Large Language Models to obtain unique feature representations for each class. It also dynamically updates the classifiers of both the seen and novel classes for improved class discriminability. An extended variant, SynC-AL incorporates a lightweight active learning module to mitigate errors during inference, for long-term model deployment. Extensive evaluation show that SynC and SynC-AL achieve state-of-the-art performance across a spectrum of classification datasets.

URL: https://openreview.net/forum?id=ZihFoM8K0j

---

Title: Getting aligned on representational alignment

Abstract: Biological and artificial information processing systems form representations of the world that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the similarity between the representations formed by these diverse systems? Do similarities in representations then translate into similar behavior? If so, then how can a system's representations be modified to better match those of another system? These questions pertaining to the study of \emph{representational alignment} are at the heart of some of the most promising research areas in contemporary cognitive science, neuroscience, and machine learning. In this Perspective, we survey the exciting recent developments in representational alignment research in the fields of cognitive science, neuroscience, and machine learning. Despite their overlapping interests, there is limited knowledge transfer between these fields, so work in one field ends up duplicated in another, and useful innovations are not shared effectively. To improve communication, we propose a unifying framework that can serve as a common language for research on representational alignment, and map several streams of existing work across fields within our framework. We also lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that this paper will catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and developing information processing systems.

URL: https://openreview.net/forum?id=Hiq7lUh4Yn

---

Title: Explainable Graph Learning for Particle Accelerator Operations

Abstract: Particle accelerators are vital tools in physics, medicine, and industry, requiring precise tuning to ensure optimal beam performance. However, real-world deviations from idealized simulations make beam tuning a time-consuming and error-prone process. In this work, we propose an explanation-driven framework for providing actionable insight into beamline operations, with a focus on the injector beamline at the Continuous Electron Beam Accelerator Facility (CEBAF). We represent beamline configurations as heterogeneous graphs, where setting nodes represent elements that human operators can actively adjust during beam tuning, and reading nodes passively provide diagnostic feedback. To identify the most influential setting nodes responsible for differences between any two beamline configurations, our approach first predicts the resulting changes in reading nodes caused by variations in settings, and then learns importance scores that capture the joint influence of multiple setting nodes. Experimental results on real-world CEBAF injector data demonstrate the framework’s ability to generate interpretable insights that can assist human operators in beamline tuning and reduce operational overhead.

URL: https://openreview.net/forum?id=jnReRk2EX1

---

Title: GraphSnapShot: Graph Machine Learning Acceleration through Fast Arch, Storage, Caching and Retrieval

Abstract: Large-scale graph machine learning suffers from prohibitive I/O latency, memory bottlenecks, and redundant computation due to the complexity of multi-hop neighbor retrieval and dynamic topology updates. We present \textbf{GraphSnapShot: Graph Machine Learning Acceleration through Fast Arch, Storage, Caching and Retrieval}, a system that decouples graph storage layout from runtime cache management to maximize data reuse and access efficiency. GraphSnapShot introduces two key components: (1) \textbf{SEMHS}, a hop-aware storage layout that co-locates neighbors in contiguous disk slabs for efficient one-burst DMA access; and (2) \textbf{GraphSDSampler}, a multi-level variance-adaptive caching module that optimizes refresh policies based on gradient statistics. Together, they form a hybrid disk–cache–memory architecture that supports high-throughput training over billion-scale graphs. Experiments on ogbn-arxiv, ogbn-products, and ogbn-mag demonstrate that GraphSnapShot achieves up to \textbf{4.9×} loader throughput, \textbf{83.5\%} GPU memory savings, and \textbf{29.6\%} end-to-end training time reduction compared to baselines like DGL’s NeighborSampler and uniform samplers. These results establish GraphSnapShot as a scalable and efficient solution for dynamic graph learning at industrial scale.

URL: https://openreview.net/forum?id=NWjlEmY4qJ

---

Title: From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning

Abstract: Offline Reinforcement Learning (RL) aims to learn effective policies from a static dataset
without requiring further agent environment interactions. However, its practical adoption is often hindered by the need for explicit reward annotations, which can be costly to engineer or difficult to obtain retrospectively. To address this, we propose ReLOAD (Reinforcement Learning with Offline Reward Annotation via Distillation), a novel reward annotation framework for offline RL. Unlike existing methods that depend on complex alignment procedures, our approach adapts Random Network Distillation (RND) to generate intrinsic rewards from expert demonstrations using a simple yet effective embedding discrepancy measure. First, we train a predictor network to mimic a fixed target network’s embeddings based on expert state transitions. Later, the prediction error between these networks serves as a reward
signal for each transition in the static dataset. This mechanism provides a structured reward signal without requiring
handcrafted reward annotations. We provide a formal theoretical construct that provides insights into how RND prediction errors effectively serve as intrinsic rewards by distinguishing expert-like transitions. Experiments on the D4RL benchmark demonstrate that ReLOAD
enables robust offline policy learning and achieves performance competitive with traditional
reward-annotated methods.

URL: https://openreview.net/forum?id=F5K94JI2Jb

---

Title: FedHERO: A Federated Learning Approach for Node Classification Task on Heterophilic Graphs

Abstract: Graph neural networks (GNNs) have shown significant success in modeling graph data, and Federated Graph Learning (FGL) empowers clients to collaboratively train GNNs in a distributed manner while preserving data privacy. However, FGL faces unique challenges when the general neighbor distribution pattern of nodes varies significantly across clients. Specifically, FGL methods usually require that the graph data owned by all clients is homophilic to ensure similar neighbor distribution patterns of nodes. Such an assumption ensures that the learned knowledge is consistent across the local models from all clients. Therefore, these local models can be properly aggregated as a global model without undermining the overall performance. Nevertheless, when the neighbor distribution patterns of nodes vary across different clients (e.g., when clients hold graphs with different levels of heterophily), their local models may gain different and even conflict knowledge from their node-level predictive tasks. Consequently, aggregating these local models usually leads to catastrophic performance deterioration on the global model. To address this challenge, we propose FedHERO, an FGL framework designed to harness and share insights from heterophilic graphs effectively. At the heart of FedHERO is a dual-channel GNN equipped with a structure learner, engineered to discern the structural knowledge encoded in the local graphs. With this specialized component, FedHERO enables the local model for each client to identify and learn patterns that are universally applicable across graphs with different patterns of node neighbor distributions. FedHERO not only enhances the performance of individual client models by leveraging both local and shared structural insights but also sets a new precedent in this field to effectively handle graph data with various node neighbor distribution patterns. We conduct extensive experiments to validate the superior performance of FedHERO against existing alternatives.

URL: https://openreview.net/forum?id=pHii7cWco7

---

Title: The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance

Abstract: Multimodal Large Language Models (MLLMs) are set to transform how machines process and generate human-like responses by integrating diverse modalities such as text, images, and code. Yet, effectively harnessing their capabilities hinges on optimal prompt engineering. In this study, we present a comprehensive experimental evaluation of seven prompt engineering methods applied to 13 open-source MLLMs over 24 tasks spanning Reasoning and Compositionality, Multimodal Understanding and Alignment, Complex Code Generation and Execution, and Knowledge Retrieval and Integration. Our approach stratifies models by parameter count into Small (< 4B), Medium (4B–10B), and Large (> 10B) categories and compares prompting techniques including Zero-Shot, One-Shot, Few-Shot, Chain-of-Thought, Analogical, Generated Knowledge, and Tree-of-Thought. Our experiments reveal that while Large MLLMs excel in structured tasks such as code generation and execution, achieving accuracies as high as 96.88% under Few-Shot prompting. In multimodal understanding and alignment (with relevance scores reaching 100% using Zero-Shot prompting), all models struggle with complex reasoning and abstract model understanding, often yielding accuracies below 60% and high hallucination rates. Notably, structured reasoning prompts (Chain-of-Thought, Analogical, Generated Knowledge and Tree-of-Thought) frequently increased hallucination up to 75% in small models and led to longer response times (exceeding 20 seconds in Large MLLMs), while simpler prompting methods (One-Shot and Few-Shot) provided more concise and efficient outputs. Our findings underscore that no single prompting method uniformly optimizes all task types. Instead, adaptive prompting strategies that combine the strengths of example-based guidance with selective structured reasoning are essential to enhance robustness, efficiency, and factual accuracy in MLLMs. Our work provides critical insights and actionable recommendations for optimizing prompt engineering, paving the way for more reliable deployment of MLLMs in real-world applications ranging from AI-assisted coding and knowledge retrieval to multimodal content understanding.

URL: https://openreview.net/forum?id=B1L8HrjoA1

---

Title: NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context?

Abstract: The capability of large language models to handle long-context information plays a crucial role across various real-world applications. Existing methods for evaluating long-context abilities often rely either on real-world long texts, making it difficult to exclude the influence of models' inherent knowledge, or introduce large amounts of irrelevant filler content to artificially reach target lengths, reducing the relevance and effectiveness of assessments. To address these limitations, we introduce NeedleBench, a comprehensive synthetic framework designed to assess retrieval and reasoning performance in bilingual long-context tasks with adaptive context lengths (e.g., 32k, 128k, and beyond). NeedleBench systematically embeds key data points at varying depths to rigorously test models' capabilities in diverse settings. Tasks within NeedleBench are categorized into two distinct scenarios: information-sparse, characterized by minimal relevant details embedded within extensive irrelevant text to simulate simpler real-world retrieval tasks; and information-dense, implemented as the Ancestral Trace Challenge, where relevant information is continuously distributed throughout the context to simulate more complex real-world reasoning tasks. Our experiments show that, while recent reasoning models such as Deepseek-R1 and OpenAI's o3 have demonstrated strong performance on mathematical reasoning benchmarks, they still struggle to generalize their reasoning abilities and perform poorly on our information-dense tasks, frequently encountering difficulties with continuous retrieval and reasoning even at relatively shorter context lengths.Furthermore, we identify and characterize a phenomenon termed `under-thinking', wherein models prematurely conclude their reasoning processes despite the availability of relevant information. NeedleBench thus provides critical insights and targeted evaluation tools essential for understanding and improving the long-context capabilities of LLMs. All codes and resources will be publicly available.

URL: https://openreview.net/forum?id=cEvmIKsRw0

---

Title: Mitigating Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning

Abstract: Communication can facilitate agents to gain a better understanding of the environment and to coordinate their behaviors in multi-agent deep reinforcement learning (MADRL). However, in certain applications, communication is not available during execution due to factors such as security concerns or limited resources. This paper focuses on a decentralized MADRL setting where communication is used only during training, but not during execution, enabling the learning of coordinated behaviors while keeping decentralized execution. While beneficial, communication can introduce uncertainty, potentially increasing the variance in the learning process of decentralized agents. We conduct the first theoretical analysis to study the variance that is caused by communication in policy gradients using actor-critic methods. Motivated by our theoretical analysis, we propose modular techniques that are designed based on our analytical findings to reduce the variance in policy gradients with communication. We incorporate these techniques into two existing algorithms developed for decentralized MADRL with communication and evaluate them on multiple multi-agent tasks in the StarCraft Multi-Agent Challenge and Traffic Junction domains. The results demonstrate that decentralized MADRL communication methods extended with our proposed techniques not only achieve high-performing agents but also reduce variance in policy gradients during training.

URL: https://openreview.net/forum?id=aqLsmBviga

---

Reply all

Reply to author

Forward

0 new messages