Accepted papers
===============
Title: Explaining Caption-Image Interactions in CLIP Models with Second-Order Attributions
Authors: Lucas Moeller, Pascal Tilli, Thang Vu, Sebastian Padó
Abstract: Dual encoder architectures like Clip models map two types of inputs into a shared em-
bedding space and predict similarities between them. Despite their wide application, it is,
however, not understood how these models compare their two inputs. Common first-order
feature-attribution methods explain importances of individual features and can, thus, only
provide limited insights into dual encoders, whose predictions depend on interactions be-
tween features.
In this paper, we first derive a second-order method enabling the attribution of predictions
by any differentiable dual encoder onto feature-interactions between its inputs. Second, we
apply our method to Clip models and show that they learn fine-grained correspondences
between parts of captions and regions in images. They match objects across input modes
and also account for mismatches. This intrinsic visual-linguistic grounding ability, however,
varies heavily between object classes, exhibits pronounced out-of-domain effects and we can
identify individual errors as well as systematic failure categories. Code is publicly available:
https://github.com/lucasmllr/exCLIP
URL: https://openreview.net/forum?id=HUUL19U7HP
---
Title: EMMA: End-to-End Multimodal Model for Autonomous Driving
Authors: Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, Yin Zhou, James Guo, Dragomir Anguelov, Mingxing Tan
Abstract: We introduce EMMA, an End-to-end Multimodal Model for Autonomous driving. Built on a multimodal large language model foundation, EMMA directly maps raw camera sensor data into various driving-specific outputs, including planner trajectories, perception objects, and road graph elements. EMMA maximizes the utility of world knowledge from the pre-trained large language models, by representing all non-sensor inputs (e.g. navigation instructions and ego vehicle status) and outputs (e.g. trajectories and 3D locations) as natural language text. This approach allows EMMA to jointly process various driving tasks in a unified language space, and generate the outputs for each task using task-specific prompts. Empirically, we demonstrate EMMA’s effectiveness by achieving state-of-the-art performance in motion planning on nuScenes as well as competitive results on an in-house large-scale benchmark. EMMA also yields competitive results for camera-primary 3D object detection on the Waymo Open Dataset (WOD). We show that co-training EMMA with planner trajectories, object detection, and road graph tasks yields improvements across all three domains, highlighting EMMA’s potential as a generalist model for autonomous driving applications. We hope that our results will inspire research to further evolve the state of the art in autonomous driving model architectures.
URL: https://openreview.net/forum?id=kH3t5lmOU8
---
Title: Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
Authors: Junjie Wu, Tsz Ting Chung, Kai Chen, Dit-Yan Yeung
Abstract: Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, we design a unified framework to measure object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to evaluate hallucinations in (object, relation, object) triplets extracted from LVLMs’ responses, making it easily generalizable to various vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. With comprehensive evaluations on Tri-HE, we observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple training-free approach that effectively mitigates hallucinations for LVLMs.
URL: https://openreview.net/forum?id=iNywrSPpvc
---
Title: Budgeted-Bandits with Controlled Restarts with Applications in Learning and Computing
Authors: Semih Cayci, Yilin Zheng, Atilla Eryilmaz
Abstract: Maximizing the cumulative reward of a sequence of tasks under a time budget has been ubiquitous in many applications in computing and machine learning. Often, tasks can have random completion time and the controller needs to learn the unknown statistics while making optimal decisions. In addition to the classic exploration-exploitation trade-off, it has been shown that restarting strategy can boost the performance of the control algorithm by interrupting ongoing tasks at the expense of losing its reward. In this work, we consider a bandit setting where each decision takes a random completion time and yields a random (possibly correlated) reward at the end, both with unknown values at the time of decision. The goal of the decision-maker is to maximize the expected total reward subject to a stringent time budget $\tau$. As an additional control, we allow the decision-maker to interrupt an ongoing task and forgo its reward for a potentially more rewarding restart. Unlike previous works, we do not: assume any prior knowledge on the system statistics, or limit the action space of restarting strategies to be finite. Under this general framework, we developed efficient bandit algorithms to find optimal arms and restart strategies with $O(\log(\tau))$ and $O(\sqrt{\tau\log(\tau)})$ regret for both finite and continuous set of restart times, respectively. Furthermore, through numerical studies, we verified the applicability of our algorithm in the diverse contexts of: (i) algorithm portfolios for SAT solvers; (ii) task scheduling in wireless networks; and (iii) hyperparameter tuning in neural network training.
URL: https://openreview.net/forum?id=lvb5qDAa4B
---
Title: RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design
Authors: Rishabh Anand, Chaitanya K. Joshi, Alex Morehead, Arian Rokkum Jamasb, Charles Harris, Simon V Mathis, Kieran Didi, Rex Ying, Bryan Hooi, Pietro Lio
Abstract: We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self consistency TM-score ≥ 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design
URL: https://openreview.net/forum?id=wOc1Yx5s09
---
Title: Leveraging AutoML for Sustainable Deep Learning: A Multi- Objective HPO Approach on Deep Shift Neural Networks
Authors: Leona Hennig, Marius Lindauer
Abstract: Deep Learning (DL) has advanced various fields by extracting complex patterns from large
datasets. However, the computational demands of DL models pose environmental and
resource challenges. Deep Shift Neural Networks (DSNNs) present a solution by leveraging
shift operations to reduce computational complexity at inference. Compared to common
DNNs, DSNNs are still less well understood and less well optimized. By leveraging AutoML
techniques, we provide valuable insights into the potential of DSNNs and how to design them
in a better way. We focus on image classification, a core task in computer vision, especially
in low-resource environments. Since we consider complementary objectives such as accuracy
and energy consumption, we combine state-of-the-art multi-fidelity (MF) hyperparameter
optimization (HPO) with multi-objective optimization to find a set of Pareto optimal trade-offs
on how to design DSNNs. Our approach led to significantly better configurations of DSNNs
regarding loss and emissions compared to default DSNNs. This includes simultaneously
increasing performance by about 20% and reducing emissions, in some cases by more than 60%.
Investigating the behavior of quantized networks in terms of both emissions and accuracy,
our experiments reveal surprising model-specific trade-offs, yielding the greatest energy
savings. For example, in contrast to common expectations, quantizing smaller portions
of the network with low precision can be optimal with respect to energy consumption
while retaining or improving performance. We corroborated these findings across multiple
backbone architectures, highlighting important nuances in quantization strategies and offering
an automated approach to balancing energy efficiency and model performance.
URL: https://openreview.net/forum?id=vk7b11DHcW
---
Title: Sparsity regularization via tree-structured environments for disentangled representations
Authors: Elliot Layne, Jason Hartford, Sebastien Lachapelle, Mathieu Blanchette, Dhanya Sridhar
Abstract: Many causal systems such as biological processes in cells can only be observed indirectly via measurements, such as gene expression. Causal representation learning---the task of correctly mapping low-level observations to latent causal variables---could advance scientific understanding by enabling inference of latent variables such as pathway activation. In this paper, we develop methods for inferring latent variables from multiple related datasets (environments) and tasks. As a running example, we consider the task of predicting a phenotype from gene expression, where we often collect data from multiple cell types or organisms that are related in known ways. The key insight is that the mapping from latent variables driven by gene expression to the phenotype of interest changes sparsely across closely related environments. To model sparse changes, we introduce Tree-Based Regularization (TBR), an objective that minimizes both prediction error and regularizes closely related environments to learn similar predictors. We prove that under assumptions about the degree of sparse changes, TBR identifies the true latent variables up to some simple transformations. We evaluate the theory empirically with both simulations and ground-truth gene expression data. We find that TBR recovers the latent causal variables better than related methods across these settings, even under settings that violate some assumptions of the theory.
URL: https://openreview.net/forum?id=ZzUz0jo200
---
Title: Cumulative Reasoning with Large Language Models
Authors: Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew C Yao
Abstract: Recent advancements in large language models (LLMs) have shown remarkable progress, yet their ability to solve complex problems remains limited. In this work, we introduce Cumulative Reasoning (CR), a structured framework that enhances LLM problem-solving by emulating human-like iterative and cumulative thought processes. CR orchestrates LLMs in three distinct roles---Proposer, Verifier(s), and Reporter---to systematically decompose tasks, generate and validate intermediate reasoning steps, and compose them into a solution by building a dynamic Directed Acyclic Graph (DAG) of verified propositions. This approach substantially enhances problem-solving capabilities. We demonstrate CR’s advantage through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over previous methods. In solving MATH problems, CR achieves a 4.2% increase from previous methods and a 43% relative improvement in the most challenging level 5 problems. When incorporating a code environment with CR, we further harness LLMs’ reasoning capabilities and outperform the Program of Thought (PoT) method by 38.8%. The code is available at https://github.com/iiis-ai/cumulative-reasoning.
URL: https://openreview.net/forum?id=grW15p4eq2
---
Title: Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Authors: Zora Che, Stephen Casper, Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai, Yarin Gal, Furong Huang, Dylan Hadfield-Menell
Abstract: Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management and governance frameworks. Currently, most risk evaluations are conducted by designing inputs that elicit harmful behaviors from the system. However, this approach suffers from two limitations. First, input-output evaluations cannot fully evaluate realistic risks from open-weight models. Second, the behaviors identified during any particular input-output evaluation can only lower-bound the model's worst-possible-case input-output behavior. As a complementary method for eliciting harmful behaviors, we propose evaluating LLMs with model tampering attacks which allow for modifications to latent activations or weights. We pit state-of-the-art techniques for removing harmful LLM capabilities against a suite of 5 input-space and 6 model tampering attacks. In addition to benchmarking these methods against each other, we show that (1) model resilience to capability elicitation attacks lies on a low-dimensional robustness subspace; (2) the success rate of model tampering attacks can empirically predict and offer conservative estimates for the success of held-out input-space attacks; and (3) state-of-the-art unlearning methods can easily be undone within 16 steps of fine-tuning. Together, these results highlight the difficulty of suppressing harmful LLM capabilities and show that model tampering attacks enable substantially more rigorous evaluations than input-space attacks alone.
URL: https://openreview.net/forum?id=E60YbLnQd2
---
Title: Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting
Authors: Soon Hoe Lim, Yijin Wang, Annan Yu, Emma Hart, Michael W. Mahoney, Sherry Li, N. Benjamin Erichson
Abstract: Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting. However, the impact of the specific choice of probability path model on forecasting performance, particularly for high-dimensional spatio-temporal dynamics, remains under-explored. In this work, we demonstrate that forecasting spatio-temporal data with flow matching is highly sensitive to the selection of the probability path model. Motivated by this insight, we propose a novel probability path model designed to improve forecasting performance. Our empirical results across various dynamical system benchmarks show that our model achieves faster convergence during training and improved predictive performance compared to existing probability path models. Importantly, our approach is efficient during inference, requiring only a few sampling steps. This makes our proposed model practical for real-world applications and opens new avenues for probabilistic forecasting.
URL: https://openreview.net/forum?id=JApMDLwbLR
---
Title: Customizing Spider Silk: Generative Models with Mechanical Property Conditioning for Protein Engineering
Authors: Neeru Dubey, Elin Karlsson, Miguel A. Redondo, Johan Reimegård, Anna Rising, Hedvig Kjellstrom
Abstract: The remarkable mechanical properties of spider silk, including its tensile strength and extensibility, are primarily governed by the repeat regions of the proteins that constitute the fiber, the major ampullate spidroins (MaSps). However, establishing correlations between mechanical characteristics and repeat sequences remains challenging due to the intricate sequence–structure–function relationships of MaSps and the limited availability of annotated datasets. In this study, we present a novel computational framework for designing MaSp repeat sequences with customizable mechanical properties. To achieve this, we developed a lightweight GPT-based generative model by distilling the pre-trained ProtGPT2 protein language model. The distilled model was subjected to multi-level fine-tuning using curated subsets of the Spider Silkome dataset. Specifically, we adapted the model for MaSp repeat generation using 6,000 MaSp repeat sequences and further refined it via cross-validation on 592 repeats associated with experimentally determined fiber-level mechanical properties. Our model generates biologically plausible MaSp repeat regions tailored to specific mechanical properties, while also predicting those properties for given sequences. Validation includes sequence-level analysis, assessing physicochemical attributes, the expected distribution of key motifs, and secondary structure compositions. A correlation study using BLAST on the Spider Silkome dataset and a test set of MaSp repeats with known mechanical properties further confirmed the predictive accuracy of the model. This framework advances the rational design of spider silk-inspired biomaterials, offering a versatile tool for engineering protein sequences with tailored mechanical attributes.
URL: https://openreview.net/forum?id=37YSapXDK6
---
Title: Detecting Systematic Weaknesses in Vision Models along Predefined Human-Understandable Dimensions
Authors: Sujan Sai Gannamaneni, Rohil Prakash Rao, Michael Mock, Maram Akila, Stefan Wrobel
Abstract: Slice discovery methods (SDMs) are prominent algorithms for finding systematic weaknesses in DNNs. They identify top-k semantically coherent slices/subsets of data where a DNN-under-test has low performance. For being directly useful, slices should be aligned with human-understandable and relevant dimensions, which, for example, are defined by safety and domain experts as part of the operational design domain (ODD). While SDMs can be applied effectively on structured data, their application on image data is complicated by the lack of semantic metadata. To address these issues, we present an algorithm that combines foundation models for zero-shot image classification to generate semantic metadata with methods for combinatorial search to find systematic weaknesses in images. In contrast to existing approaches, ours identifies weak slices that are in line with predefined human-understandable dimensions. As the algorithm includes foundation models, its intermediate and final results may not always be exact. Therefore, we include an approach to address the impact of noisy metadata. We validate our algorithm on both synthetic and real-world datasets, demonstrating its ability to recover human-understandable systematic weaknesses. Furthermore, using our approach, we identify systematic weaknesses of multiple pre-trained and publicly available state-of-the-art computer vision DNNs.
URL: https://openreview.net/forum?id=yK9pvt4nBX
---
Title: Transformers trained on proteins can learn to attend to Euclidean distance
Authors: Isaac Ellmen, Constantin Schneider, Matthew I. J. Raybould, Charlotte Deane
Abstract: While conventional Transformers generally operate on sequence data, they can be used in conjunction with structure models, typically SE(3)-invariant or equivariant graph neural networks (GNNs), for 3D applications such as protein structure modelling. These hybrids typically involve either (1) preprocessing/tokenizing structural features as input for Transformers or (2) taking Transformer embeddings and processing them within a structural representation. However, there is evidence that Transformers can learn to process structural information on their own, such as the AlphaFold3 structural diffusion model. In this work we show that Transformers can function independently as structure models when passed linear embeddings of coordinates. We first provide a theoretical explanation for how Transformers can learn to filter attention as a 3D Gaussian with learned variance. We then validate this theory using both simulated 3D points and in the context of masked token prediction for proteins. Finally, we show that pre-training protein Transformer encoders with structure improves performance on multiple downstream tasks, yielding competitive performance with custom structural models. Together, this work provides a basis for using standard Transformers as hybrid structure-language models. The code is available at: https://github.com/oxpig/attending-to-distance.
URL: https://openreview.net/forum?id=mU59bDyqqv
---
Title: Long Context Transfer from Language to Vision
Authors: Peiyuan Zhang, Kaichen Zhang, Bo Li, Guangtao Zeng, Jingkang Yang, Yuanhan Zhang, Ziyue Wang, Haoran Tan, Chunyuan Li, Ziwei Liu
Abstract: Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively, in this paper, we approach this problem from the perspective of the language model. By simply extrapolating the context length of the language backbone, we enable LMMs to comprehend orders of magnitude more visual tokens without any video training. We call this phenomenon \textit{long context transfer} and carefully ablate its properties. To effectively measure LMMs' ability to generalize to long contexts in the vision modality, we develop V-NIAH (Visual Needle-In-A-Haystack), a purely synthetic long vision benchmark inspired by the language model'
s NIAH test. Our proposed Long Video Assistant (LongVA) can process 2000 frames or over 200K visual tokens without additional complexities. With its extended context length, LongVA achieves state-of-the-art performance on Video-MME and MLVU among 7B-scale models by densely sampling more input frames.
URL: https://openreview.net/forum?id=30RAWQVGlx
---
Title: YoooP: You Only Optimize One Prototype per Class for Non-Exemplar Incremental Learning
Authors: Jiangtao Kong, Zhenyu Zong, Tianyi Zhou, Huajie Shao
Abstract: Incremental learning (IL) usually addresses catastrophic forgetting of old tasks when learning new tasks by replaying old tasks' raw data stored in memory, which can be limited by its size and the risk of privacy leakage. Recent non-exemplar IL methods store class centroids as prototypes and perturb them with high-dimensional Gaussian noise to generate synthetic data for replaying. Unfortunately, this approach has two major limitations. First, the boundary between embedding clusters around prototypes of different classes might be unclear, leading to serious catastrophic forgetting. Second, directly applying high-dimensional Gaussian noise produces nearly identical synthetic samples that fail to preserve the true data distribution, ultimately degrading performance. In this paper, we propose YoooP, a novel exemplar-free IL approach that can greatly outperform previous methods by only storing and replaying one prototype per class even without synthetic data replay. Instead of merely storing class centroids, YoooP optimizes each prototype by (1) shifting it to high-density regions within each class using an attentional mean-shift algorithm, and (2) optimizing its cosine similarity with class-specific embeddings to form compact, well-separated clusters. As a result, replaying only the optimized prototypes effectively reduces inter-class interference and maintains clear decision boundaries. Furthermore, we extend YoooP to YoooP+ by synthesizing replay data preserving the angular distribution between each class prototype and the class's real data in history, which cannot be obtained by high-dimensional Gaussian perturbation. YoooP+ effectively stabilizes and further improves YoooP without storing real data. Extensive experiments demonstrate the superiority of YoooP/YoooP+ over non-exemplar baselines in terms of different metrics. The code is released at https://github.com/Snowball0823/YoooP.git.
URL: https://openreview.net/forum?id=FYe66NLDkO
---
Title: Bayesian Optimization of Robustness Measures under Input Uncertainty: A Randomized Gaussian Process Upper Confidence Bound Approach
Authors: Yu Inatsu
Abstract: Bayesian optimization based on the Gaussian process upper confidence bound (GP-UCB) offers a theoretical guarantee for optimizing black-box functions. In practice, however, black-box functions often involve input uncertainty. To handle such cases, GP-UCB can be extended to optimize evaluation criteria known as robustness measures. However, GP-UCB-based methods for robustness measures require a trade-off parameter, $\beta$, which, as in the original GP-UCB, must be set sufficiently large to ensure theoretical validity. In this study, we propose randomized robustness measure GP-UCB (RRGP-UCB), a novel method that samples $\beta$ from a chi-squared-based probability distribution. This approach eliminates the need to explicitly specify $\beta$. Notably, the expected value of $\beta$ under this distribution is not excessively large. Furthermore, we show that RRGP-UCB provides tight bounds on the expected regret between the optimal and estimated solutions. Numerical experiments demonstrate the effectiveness of the proposed method.
URL: https://openreview.net/forum?id=FDzojiLSia
---
Title: Toward Linearly Regularizing the Geometric Bottleneck of Linear Generalized Attention
Authors: Jiaxu Liu, Xinping Yi, Xiangyu Yin, Yuhang Song, Gaojie Jin, Xiaowei Huang
Abstract: Transformers excel across domains, yet their full self-attention carries a prohibitive $\mathcal{O}(n^2)$ cost for long sequences with length $n$. Existing \textit{efficient} attention methods either restrict the attention pattern (local/sparse attention) or approximate the softmax kernel with certain drawbacks. The former suffers from attention bottlenecks (over-squashing of long-range dependencies) and invalidates the use of global tokens in autoregressive tasks, while the latter often requires sequential processing that can degrade in accuracy when approximations fall short. In this work, we introduce the \textit{Bottleneck Regularized Linear Attention (BRL-Attention)}, uniting the strengths of pattern-based and kernel-based techniques to enable efficient, global information flow with linear complexity. BRL-Attention extends a local attention pattern with a small set of compressed tokens that serve as a global information reservoir, ensuring long-range interactions without quadratic cost. This bottleneck regularization strategy effectively alleviates the geometric attention bottleneck and retains full expressiveness; that is, it matches the sequence modeling capacity of full softmax attention while mitigating over-squashing across layers. Moreover, it integrates global tokens without breaking causal masking, making it applicable to both encoder-only and autoregressive decoder architectures. Extensive experiments on sequence and graph benchmarks demonstrate that BRL-Attention matches or surpasses the predictive performance of standard Transformers with full attention, while substantially reducing memory usage and computation time to levels comparable with linear sparse attention.
URL: https://openreview.net/forum?id=Vpyg3fqXbl
---
Title: Unmasking Trees for Tabular Data
Authors: Calvin McCarter
Abstract: Despite much work on advanced deep learning and generative modeling techniques for tabular data generation and imputation, traditional methods have continued to win on imputation benchmarks. We herein present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees which are used to incrementally unmask individual features. On a benchmark for out-of-the-box performance on 27 small tabular datasets, UnmaskingTrees offers leading performance on imputation; state-of-the-art performance on generation given data with missingness; and competitive performance on vanilla generation given data without missingness. To solve the conditional generation subproblem, we propose a tabular probabilistic prediction method, BaltoBot, which fits a balanced tree of boosted tree classifiers. Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions; unlike newer diffusion methods, it offers fast sampling, closed-form density estimation, and flexible handling of discrete variables. We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.
URL: https://openreview.net/forum?id=0AxbTF3Ouq
---
Title: Diffusion-RainbowPA: Improvements Integrated Preference Alignment for Diffusion-based Text-to-Image Generation
Authors: Haoyuan Sun, Bin Liang, Bo Xia, Jiaqi Wu, Yifei Zhao, Kai Qin, Yongzhe Chang, Xueqian Wang
Abstract: Although rapidly increasing capabilities of text-to-image (T2I) models have profound implications across various industries, they concurrently suffer from numerous shortcomings, necessitating the implementation of effective alignment strategies with human preference. Diffusion-DPO and SPO have emerged as robust approaches for aligning diffusion-based T2I models with human preference feedback. However, they tend to suffer from text-image misalignment, aesthetic overfitting and low-quality generation. To tackle such matters, we improve the alignment paradigm through a tripartite perspective, which are the calibration enhancement (Calibration Enhanced Preference Alignment), the overfitting mitigation (Identical Preference Alignment, Jensen-Shannon Divergence Constraint) and the performance optimization (Margin Strengthened Preference Alignment, SFT-like Regularization). Furthermore, combining them with the step-aware preference alignment paradigm, we propose the Diffusion-RainbowPA, a suite of total six improvements that collectively improve the alignment performance of Diffusion-DPO. With comprehensive alignment performance evaluation and comparison, it is demonstrated that Diffusion-RainbowPA outperforms current state-of-the-art methods. We also conduct ablation studies on the introduced components that reveal incorporation of each has positively enhanced alignment performance.
URL: https://openreview.net/forum?id=KY0TSY2bx8
---
Title: MOCK: an Algorithm for Learning Nonparametric Differential Equations via Multivariate Occupation Kernel Functions
Authors: Victor William Rielly, Kamel Lahouel, Ethan Lew, Nicholas Fisher, Vicky Geneva Haney, Michael Lee Wells, Bruno Michel Jedynak
Abstract: Learning a nonparametric system of ordinary differential equations from trajectories in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations often scale quadratically in $d$ unless additional knowledge about system properties, such as sparsity and symmetries, is available. In this work, we propose a linear approach, the multivariate occupation kernel method (MOCK), using the implicit formulation provided by vector-valued reproducing kernel Hilbert spaces. The solution for the vector field relies on multivariate occupation kernel functions associated with the trajectories and scales linearly with the dimension of the state space. We validate through experiments on a variety of simulated and real datasets ranging from 2 to 1024 dimensions, and provide an example with a divergence-free vector field. MOCK outperforms all other comparators on 3 of the 9 datasets on full trajectory prediction and 4 out of the 9 datasets on next-point prediction.
URL: https://openreview.net/forum?id=fjVIp2Z9RS
---
Title: Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors
Authors: Soumava Paul, Prakhar Kaushik, Alan Yuille
Abstract: In this work, we introduce a generative approach for pose-free (without camera parameters) reconstruction of 360 scenes from a sparse set of 2D images. Pose-free scene reconstruction from incomplete, pose-free observations is usually regularized with depth estimation or 3D foundational priors. While recent advances have enabled sparse-view reconstruction of large complex scenes (with high degree of foreground and background detail) with known camera poses using view-conditioned generative priors, these methods cannot be directly adapted for the pose-free setting when ground-truth poses are not available during evaluation. To address this, we propose an image-to-image generative model designed to inpaint missing details and remove artifacts in novel view renders and depth maps of a 3D scene. We introduce context and geometry conditioning using Feature-wise Linear Modulation (FiLM) modulation layers as a lightweight alternative to cross-attention and also propose a novel confidence measure for 3D Gaussian splat representations to allow for better detection of these artifacts. By progressively integrating these novel views in a Gaussian-SLAM-inspired process, we achieve a multi-view-consistent 3D representation. Evaluations on the MipNeRF360 and DL3DV-10K benchmark dataset demonstrate that our method surpasses existing pose-free techniques and performs competitively with state-of-the-art posed (precomputed camera parameters are given) reconstruction methods in complex 360 scenes. Our code and datasets will be open-sourced upon acceptance.
URL: https://openreview.net/forum?id=yp1CYo6R0r
---
Title: Abstraction for Bayesian Reinforcement Learning in Factored POMDPs
Authors: Rolf A. N. Starre, Sammie Katt, Mustafa Mert Çelikok, Marco Loog, Frans A Oliehoek
Abstract: Bayesian reinforcement learning provides an elegant solution to addressing the exploration-exploitation trade-off in Partially Observable Markov Decision Processes (POMDPs) when the environment’s dynamics and reward function are initially unknown. By maintaining a belief over these unknown components and the state, the agent can effectively learn the environment’s dynamics and optimize their policy. However, scaling Bayesian reinforcement learning methods to large problems remains to be a significant challenge. While prior
work has leveraged factored models and online sample-based planning to address this issue, these approaches often retain unnecessarily complex models and factors within the belief space that have minimal impact on the optimal policy. While this complexity might be necessary for accurate model learning, in reinforcement learning, the primary objective is not to recover the ground truth model but to optimize the policy for maximizing the expected sum of rewards. Abstraction offers a way to reduce model complexity by removing factors that are less relevant to achieving high rewards. In this work, we propose and analyze the integration of abstraction with online planning in factored POMDPs. Our empirical results demonstrate two key benefits. First, abstraction reduces model size, enabling faster simulations and thus more planning simulations within a fixed runtime. Second, abstraction enhances performance even with a fixed number of simulations due to greater statistical strength. These results underscore the potential of abstraction to improve both the scalability and effectiveness of Bayesian reinforcement learning in factored POMDPs.
URL: https://openreview.net/forum?id=HHgdT6m9L9
---
Title: Text-to-Image Generation Via Energy-Based CLIP
Authors: Roy Ganz, Michael Elad
Abstract: Joint Energy Models (JEMs), while drawing significant research attention, have not been successfully scaled to real-world, high-resolution datasets. We present CLIP-JEM, a novel approach extending JEMs to the multimodal vision-language domain using CLIP, integrating both generative and discriminative objectives. For the generative one, we introduce an image-text joint-energy function based on Cosine similarity in the CLIP space, training CLIP to assign low energy to real image-caption pairs and high energy otherwise. For the discriminative one, we employ contrastive adversarial loss, extending the adversarial training objective to the multimodal domain. CLIP-JEM not only generates realistic images from text but also achieves competitive results on the compositionality benchmark, outperforming leading methods with fewer parameters. Additionally, we demonstrate the superior guidance capability of CLIP-JEM by enhancing CLIP-based generative frameworks and converting unconditional diffusion models to text-based ones. Lastly, we show that our model can serve as a more robust evaluation metric for text-to-image generative tasks than CLIP.
URL: https://openreview.net/forum?id=FBmWiJXIGk
---
Title: Unlocking Visual Secrets: Inverting Features with Diffusion Priors for Image Reconstruction
Authors: Sai Qian Zhang, Ziyun Li, Chuan Guo, Saeed Mahloujifar, Deeksha Dangwal, G. Edward Suh, Barbara De Salvo, Chiao Liu
Abstract: Inverting visual representations within deep neural networks (DNNs) presents a challenging and important problem in the field of security and privacy for deep learning. The main goal is to invert the features of an unidentified target image generated by a pre-trained DNN, aiming to reconstruct the original image. Feature inversion holds particular significance in understanding the privacy leakage inherent in contemporary split DNN execution techniques, as well as in various applications based on the extracted DNN features.
In this paper, we explore the use of diffusion models, a promising technique for image synthesis, to enhance feature inversion quality. We also investigate the potential of incorporating alternative forms of prior knowledge, such as textual prompts and cross-frame temporal correlations, to further improve the quality of inverted features. Our findings reveal that diffusion models can effectively leverage hidden information from the DNN features, resulting in superior reconstruction performance compared to previous methods.
This research offers valuable insights into how diffusion models can enhance privacy and security within applications that are reliant on DNN features.
URL: https://openreview.net/forum?id=j6MgbuBiGV
---
Title: Exploring the potential of Direct Feedback Alignment for Continual Learning
Authors: Sara Folchini, Viplove Arora, Sebastian Goldt
Abstract: Real-world applications of machine learning require robustness to shifts in the data distribution over time. A critical limitation of standard artificial neural networks trained with backpropagation (BP) is their susceptibility to catastrophic forgetting: they “forget” prior knowledge when trained on a new task, while biological neural networks tend to be more robust to catastrophic forgetting. While various algorithmic ways of mitigating catastrophic forgetting have been proposed, developing an optimization algorithm that is capable of learning continuously remains an open problem. Motivated by recent theoretical results, here we explore whether a biologically inspired learning algorithm like Direct Feedback Align-
ment (DFA) can mitigate catastrophic forgetting in artificial neural networks. We train fully-connected networks on several continual learning benchmarks using DFA and compare its performance to vanilla backpropagation, random features, and other continual learning algorithms. We find that an inherent bias of DFA, called “degeneracy breaking”, leads to low average forgetting on common continual learning benchmarks when using DFA in the Domain-Incremental and the Task-Incremental learning scenarios. We show how to control the trade-off between learning and forgetting with DFA, and relate different modes of using DFA to other methods in the field.
URL: https://openreview.net/forum?id=MRZQrn7JEG
---
Title: Large Action Models: From Inception to Implementation
Authors: Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, He Huang, Si Qin, Qisheng Su, Jiayi Ye, Yudi Zhang, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
Abstract: As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence.
In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications.
URL: https://openreview.net/forum?id=bYdKtf0Q31
---
Title: Diverse Condensed Data Generation via Class Preserving Distribution Matching
Authors: Dandan Guo, Zhuo Li, He Zhao, Mingyuan Zhou, Hongyuan Zha
Abstract: Large-scale datasets for training many real-world machine learning models pose significant computational resource challenges. One approach to mitigate this is via data condensation, which aims at learning a small dataset but still sufficiently capturing the rich information in the original one. Most of existing approaches learn the condensed dataset and task-related model parameters (e.g., classifier) in a bi-level meta-learning way. The recently proposed distribution matching (DM), however, avoids the expensive bi-level optimization but ignores task-related models. This work proposes a novel class preserving DM framework consisting of two key components. The first one is responsible for capturing the original data distribution of each class based on energy distance, which can encourage the diversity in the generated synthetic data. The other is classifier-critic constraint, which forces the learned synthetic samples to fit pre-trained task-related models, such as an off-the-shelf classifier. Designing the optimization loss in this way, we can generate more diverse and class preserving distilled data without the bi-level optimization. Extensive experiments reveal that our method can produce more effective condensed data for downstream tasks with less training cost and can also be successfully applied to de-biased dataset condensation.
URL: https://openreview.net/forum?id=QOrzmDQYou
---
New submissions
===============
Title: Continual Pre-training of MoEs: How robust is your router?
Abstract: Sparsely-activated Mixture of Experts (MoE) transformers are promising architectures for foundation models. Compared to dense transformers that require the same amount of floating point operations (FLOPs) per forward pass, MoEs benefit from improved sample efficiency at training time and achieve much stronger performance. Many closed-source and open-source frontier language models have thus adopted an MoE architecture. Naturally, practitioners will want to extend the capabilities of these models with large amounts of newly collected data without completely re-training them. Prior work has shown that a simple combination of replay and learning rate re-warming and re-decaying can enable the continual pre-training (CPT) of dense decoder-only transformers with minimal performance degradation compared to full re-training. In the case of decoder-only MoE transformers, however, it is unclear how the routing algorithm will impact continual pre-training performance: 1) do the MoE transformer's routers exacerbate forgetting relative to a dense model?; 2) do the routers maintain a balanced load on previous distributions after CPT?; 3) are the same strategies applied to dense models sufficient to continually pre-train MoE LLMs? In what follows, we conduct a large-scale ($>2$B parameter switch and DeepSeek MoE LLMs trained for $600$B tokens) empirical study across four MoE transformers to answer these questions. Our results establish a surprising robustness to distribution shifts for MoEs using both Sinkhorn-Balanced and Z-and-Aux-loss-balanced routing algorithms, even in MoEs continually pre-trained without replay. Moreover, we show that MoE LLMs maintain their sample efficiency (relative to a FLOP-matched dense model) during CPT and that they can match the performance of a fully re-trained MoE at a fraction of the cost.
URL: https://openreview.net/forum?id=dR7C1K71Rs
---
Title: LanPaint: Training-Free Diffusion Inpainting with Asymptotically Exact and Fast Conditional Sampling
Abstract: Diffusion models excel at joint pixel sampling for image generation but lack efficient training-free methods for partial conditional sampling (e.g., inpainting with known pixels). Prior work typically formulates this as an intractable inverse problem, relying on coarse variational approximations, heuristic losses requiring expensive backpropagation, or slow stochastic sampling. These limitations preclude: (1) accurate distributional matching in inpainting results, (2) efficient inference modes without gradient, (3) compatibility with fast ODE-based samplers. To address these limitations, we propose LanPaint: a training-free, asymptotically exact partial conditional sampling methods for ODE-based and rectified flow diffusion models. By leveraging carefully designed Langevin dynamics, LanPaint enables fast, backpropagation-free Monte Carlo sampling. Experiments demonstrate that our approach achieves superior performance with precise partial conditioning and visually coherent inpainting across diverse tasks.
URL: https://openreview.net/forum?id=JPC8JyOUSW
---
Title: Fine-Grained Alignment and Noise Refinement for Compo- sitional Text-to-Image Generation
Abstract: Text-to-image generative models have made significant advancements in recent years; however, accurately capturing intricate details in textual prompts—such as entity missing, attribute binding errors, and incorrect relationships remains a formidable challenge. In response, we present an innovative, training-free method that directly addresses these challenges by incorporating tailored objectives to account for textual constraints. Unlike layout-based approaches that enforce rigid structures and limit diversity, our proposed approach offers a more flexible arrangement of the scene by imposing just the extracted constraints from the text, without any unnecessary additions. These constraints are formulated as losses—entity missing, entity mixing, attribute binding, and spatial relationships—integrated into a unified loss that is applied in the first generation stage. Furthermore, we introduce a feedback-driven system for fine-grained initial noise refinement. This system integrates a verifier that evaluates the generated image, identifies inconsistencies, and provides corrective feedback. Leveraging this feedback, our refinement method first targets the unmet constraints by refining the faulty attention maps caused by initial noise, through the optimization of selective losses associated with these constraints. Subsequently, our unified loss function is reapplied to proceed the second generation phase. Experimental results demonstrate that our method, relying solely on our proposed objective functions, significantly enhances compositionality, achieving a 24% improvement in human evaluation and a 25% gain in spatial relationships. Furthermore, our fine-grained noise refinement proves effective, boosting performance by up to 5%.
URL: https://openreview.net/forum?id=E4lCW97Avm
---
Title: Decentralized Projection-free Online Upper-Linearizable Optimization with Applications to DR-Submodular Optimization
Abstract: We introduce a novel framework for decentralized projection-free optimization, extending projection-free methods to a broader class of upper-linearizable functions. Our approach leverages decentralized optimization techniques with the flexibility of upper-linearizable function frameworks, effectively generalizing traditional DR-submodular function optimization. We obtain the regret of $O(T^{1-\theta/2})$ with communication complexity of $O(T^{\theta})$ and number of linear optimization oracle calls of $O(T^{2\theta})$ for decentralized upper-linearizable function optimization, for any $0\le \theta \le 1$. This approach allows for the first results for monotone up-concave optimization with general convex constraints and non-monotone up-concave optimization with general convex constraints. Further, the above results for first order feedback are extended to zeroth order, semi-bandit, and bandit feedback.
URL: https://openreview.net/forum?id=bZ5WD2HUQr
---
Title: Constrained Reinforcement Learning with Smoothed Log Barrier Function
Abstract: Deploying reinforcement learning (RL) in real-world systems often requires satisfying strict safety constraints during both training and deployment, which simple reward shaping typically fails to enforce. Existing constrained RL algorithms frequently face several major challenges, including instabilities during training and overly conservative policies.
To overcome these limitations, we propose CSAC-LB (Constrained Soft Actor-Critic with Log Barrier), a model-free, sample-efficient, off-policy algorithm that requires no pre-training. CSAC-LB integrates a linear smoothed log barrier function into the actor’s objective, providing a numerically stable, non-vanishing gradient that enables the agent to quickly recover from unsafe states while avoiding the instability of traditional interior-point methods. To further enhance safety and mitigate the underestimation of constraint violations, we employ a pessimistic double-critic architecture for the cost function, taking the maximum of two cost Q-networks to conservatively guide the policy.
Through extensive experiments on challenging constrained control tasks, we demonstrate that CSAC-LB significantly outperforms baselines by consistently achieving high returns while strictly adhering to safety constraints. Our results establish CSAC-LB as a robust and stable solution for applying RL to safety-critical domains.
URL: https://openreview.net/forum?id=Amh95oURaE
---
Title: CATS: Cross-Modal Autoencoding for Time Series Summarization
Abstract: Despite the rapid advancement in multimodal deep learning and generative AI, automatic description of time series remains a challenging problem, highly relevant in the industry, financial and medical domains, weather forecasting, and other areas. Summarization of characteristic patterns and trends in time series can facilitate data analytics and enable flexible user experience. Yet, existing studies have not seen definitive successes so far, largely due to the scarcity of labeled data.
With the recent popularity of large language models, attempts have been made to apply them to time series modeling. However, their performance is often suboptimal, not to mention their big carbon footprint. Other LLM limitations, such as slow inference and the need to use them online or deploy on big GPUs, are often unacceptable in practice due to cybersecurity and data privacy compliance restrictions. To this end, we propose Cross-modal Autoencoding for Time series Summarization (CATS), a compact model trained using a novel cross-modal autoencoding method, faithfully capturing relevant properties of the input despite limited training data. We empirically demonstrate the effectiveness of CATS on real-world industrial data and an additional financial dataset.
URL: https://openreview.net/forum?id=qlJ9Oj6wZm
---
Title: Synthesizing Moving People with 3D Control
Abstract: In this paper, we present a diffusion model-based methodology for animating people from a single image for a given target 3D motion sequence. Our approach has two core components: a) learning priors about invisible parts of the human body and clothing, and b) rendering novel body poses with proper clothing and texture. For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image. We train this model on texture map space, which makes it more sample-efficient since it is invariant to pose and viewpoint. Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses. This produces realistic renderings of novel poses of the person, including clothing, hair, and plausible in-filling of unseen regions. This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity. In addition to that, the 3D control allows various synthetic camera trajectories to render a person. Our experiments show that our method is resilient in generating prolonged motions and varied challenging and complex poses compared to prior methods.
URL: https://openreview.net/forum?id=rYmgSoRWWg
---
Title: Local K-Similarity Constraint for Federated Learning with Label Noise
Abstract: Federated learning on clients with noisy labels is a challenging problem, as such clients can infiltrate the global model, impacting the overall generalizability of the system. Existing methods proposed to handle noisy clients assume that a sufficient number of clients with clean labels are available, which can be leveraged to learn a robust global model while dampening the impact of noisy clients. This assumption fails when a high number of heterogeneous clients contain noisy labels, making the existing approaches ineffective. In such scenarios, it is important to locally regularize the clients before communication with the global model, to ensure the global model isn't corrupted by noisy clients. While pre-trained self-supervised models can be effective for local regularization, existing centralized approaches relying on pretrained initialization are impractical in a federated setting due to the large size of these models. In that line, we propose a regularization objective for client models that decouples the pre-trained and classification models by enforcing similarity between close data points within the client.
We leverage the representation space of a self-supervised pretrained model to evaluate the closeness among examples. This regularization, when applied with standard objective function for the downstream task in standard noisy federated settings, significantly improves performance, outperforming existing state-of-the-art federated methods in multiple computer vision and medical image classification benchmarks. Unlike other techniques that rely on self-supervised pretrained initialization, our method does not require the pretrained model and classifier backbone to share the same architecture, making it architecture-agnostic.
URL: https://openreview.net/forum?id=ekdnWWSZT5
---
Title: Classification of high-dimensional data with spiked covariance matrix structure
Abstract: We study the classification problem for
high-dimensional data with $n$ observations on $p$ features where the
$p \times p$ covariance matrix $\Sigma$ exhibits a spiked eigenvalues structure and the
vector $\zeta$, given by the difference between the {\em whitened} mean
vectors, is sparse. We analyzed an adaptive
classifier (adaptive with respect to the sparsity $s$) that first
performs dimension reduction on the feature vectors prior to classification in
the dimensionally reduced space, i.e., the classifier whitened
the data, then screen the features by keeping only those corresponding
to the $s$ largest coordinates of $\zeta$ and finally apply Fisher
linear discriminant on the selected features. Leveraging recent
results on entrywise matrix perturbation bounds for covariance
matrices, we show that the resulting classifier is Bayes optimal
whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow
0$. Finally, experiment results on real and synthetic data indicate that
the classifier is competitive with
state-of-the-art methods while also selecting a smaller number of features.
paragraph.
URL: https://openreview.net/forum?id=6bQDtTbaQs
---
Title: The Cost of Replicability in Active Learning
Abstract: Active learning aims to reduce the number of labeled data points required by machine learning algorithms by selectively querying labels from initially unlabeled data. Ensuring replicability, where an algorithm produces consistent outcomes across different runs, is essential for the reliability of machine learning models but often increases sample complexity. This report investigates the cost of replicability in active learning using two classical disagreement-based methods: the CAL and A\textsuperscript{2} algorithms. Leveraging random thresholding techniques, we propose two replicable active learning algorithms: one for realizable learning of finite hypothesis classes, and another for agnostic. Our theoretical analysis shows that while enforcing replicability increases label complexity, CAL and A\textsuperscript{2} still achieve substantial label savings under this constraint. These findings provide key insights into balancing efficiency and stability in active learning.
URL: https://openreview.net/forum?id=ZsqJu9eITd
---
Title: Vejde: A Framework for Inductive Deep Reinforcement Learning Based on Factor Graph Color Refinement
Abstract: We present and evaluate Vejde; a framework which combines data abstraction, graph learning and reinforcement learning
to produce inductive policy functions for decision problems with richly structured states, such as object classes and relations.
Markov decision process states are represented as data bases of facts about entities, and Vejde converts each state to a bipartite graph, which is mapped to latent states through neural message passing. The factored representation of both states and actions allows Vejde agents to handle problems of varying size and structure. We tested Vejde agents on eight problem domains defined in RDDL, with ten problem instances each, where policies were trained using both supervised and reinforcement learning. To test policy generalization, we separate problem instances in two sets, one for training and the other solely for testing. Test results on unseen instances for the Vejde agents were compared to MLP agents trained on each problem instance, as well as the online planning algorithm Prost.
Our results show that Vejde policies in average generalize to the test instances without a significant loss in score. Additionally, the inductive agents received scores on unseen test instances that on average were close to the instance-specific MLP agents.
URL: https://openreview.net/forum?id=EFSZmL1W1Z
---
Title: CoDeGraph: Consistent Anomaly Detection via Image-Level Graph Representation for Zero-Shot Industrial Inspection
Abstract: Zero-shot image anomaly classification (AC) and anomaly segmentation (AS) play a crucial role in industrial quality control, where defects must be detected without prior training data. Current representation-based approaches rely on comparing patch features with nearest neighbors in unlabeled test images. However, these methods fail when faced with consistent anomalies—similar defects that consistently appear across multiple images—leading to poor AC/AS performance. We present Consistent-Anomaly Detection Graph (CoDeGraph), a novel algorithm that addresses this challenge by identifying and filtering consistent anomalies from similarity computations. Our key insight is that for industrial images, normal patches exhibit stable, gradually increasing similarity to other test images, whereas consistent-anomaly patches show abrupt spikes after exhausting a limited set of images with similar matches. We term this phenomenon ``neighbor-burnout'' and engineer a robust system to exploit it. CoDeGraph constructs an image-level graph, with images as nodes and edges linking those with shared consistent-anomaly patterns, using community detection to identify and filter out consistent-anomaly patches. To provide a theoretical explanation for this phenomenon, we develop a model grounded in Extreme Value Theory that explains why our approach is effective. Experimental results on MVTec AD using the ViT-L-14-336 backbone show 98.3\% AUROC for AC and AS performance of 66.8\% (+4.2\%) F1 and 68.1\% (+5.4\%) AP over state-of-the-art zero-shot methods. Additional experiments with the DINOv2 backbone further enhance segmentation, achieving a 69.1\% (+6.5\%) F1 and a 71.9\% (+9.2\%) AP, demonstrating the robustness of our approach across different architectures.
URL: https://openreview.net/forum?id=o2MRb5QZ34
---
Title: Pref-GUIDE: Continual Policy Learning from Real-Time Human Feedback via Preference-Based Learning
Abstract: Training reinforcement learning agents with human feedback is crucial when task objectives are difficult to specify through dense reward functions. While prior methods rely on offline trajectory comparisons to elicit human preferences, such data is unavailable in online learning scenarios where agents must adapt on the fly. Recent approaches address this by collecting real-time scalar feedback to guide agent behavior and train reward models for continued learning after human feedback becomes unavailable. However, scalar feedback is often noisy and inconsistent, limiting the accuracy and generalization of learned rewards. We propose Pref-GUIDE, a framework that transforms real-time scalar feedback into preference-based data to improve reward model learning for continual policy training. Pref-GUIDE Individual mitigates temporal inconsistency by comparing agent behaviors within short windows and filtering ambiguous feedback. Pref-GUIDE Voting further enhances robustness by aggregating reward models across a population of users to form consensus preferences. Across three challenging environments, Pref-GUIDE significantly outperforms scalar-feedback baselines, with the voting variant exceeding even expert-designed dense rewards. By reframing scalar feedback as structured preferences with population feedback, Pref-GUIDE, offers a scalable and principled approach for harnessing human input in online reinforcement learning.
URL: https://openreview.net/forum?id=dWGUwidXDm
---
Title: Conditional Policy Generator for Dynamic Constraint Satisfaction and Optimization
Abstract: Leveraging machine learning methods to solve constraint satisfaction problems has shown promising, but they are mostly limited to a static situation where the problem description is completely known and fixed from the beginning. In this work we present a new approach to constraint satisfaction and optimization in dynamically changing environments, particularly when variables in the problem are statistically independent. We frame it as a reinforcement learning problem and introduce a conditional policy generator by borrowing the idea of class conditional generative adversarial networks (GANs). Assuming that the problem includes both static and dynamic constraints, the former are used in a reward formulation to guide the policy training such that it learns to map to a probabilistic distribution of solutions satisfying static constraints from a noise prior, which is similar to a generator in GANs. On the other hand, dynamic constraints in the problem are encoded to different class labels and fed with the input noise. The policy is then simultaneously updated for maximum likelihood of correctly classifying given the dynamic conditions in a supervised manner. We empirically demonstrate a proof-of-principle experiment with a multi-modal constraint satisfaction problem and compare between unconditional and conditional cases.
URL: https://openreview.net/forum?id=IHyEiLvGaa
---
Title: PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training
Abstract: Neural operators offer a powerful paradigm for solving partial differential equations (PDEs) that cannot be solved analytically by learning mappings between function spaces. However, there are two main bottlenecks in training neural operators: they require a significant amount of training data to learn these mappings, and this data needs to be labeled, which can only be accessed via expensive simulations with numerical solvers. To alleviate both of these issues simultaneously, we propose PICore, an unsupervised coreset selection framework that identifies the most informative training samples without requiring access to ground-truth PDE solutions. PICore leverages a physics-informed loss to select unlabeled inputs by their potential contribution to operator learning. After selecting a compact subset of inputs, only those samples are simulated using numerical solvers to generate labels, reducing annotation costs. We then train the neural operator on the reduced labeled dataset, significantly decreasing training time as well. Across four diverse PDE benchmarks and multiple coreset selection strategies, PICore achieves up to 78% average increase in training efficiency relative to supervised coreset selection methods with minimal changes in accuracy.
URL: https://openreview.net/forum?id=l0VSewTJCI
---
Title: Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design
Abstract: Self-supervised learning (SSL) plays a central role in molecular representation learning. Yet, many recent innovations in masking-based pretraining are introduced as heuristics and lack principled evaluation, obscuring which design choices are genuinely effective. This work cast the entire pretrain–finetune workflow into a unified probabilistic framework, enabling a transparent comparison and deeper understanding of masking strategies. Building on this formalism, we conduct a controlled study of three core design dimensions: masking distribution, prediction target, and encoder architecture, under rigorously controlled settings. We further employ information-theoretic measures to assess the informativeness of pretraining signals and connect them to empirically benchmarked downstream performance. Our findings reveal a surprising insight: sophisticated masking distributions offer no consistent benefit over uniform sampling for common node-level prediction tasks. Instead, the choice of prediction target and its synergy with the encoder architecture are far more critical. Specifically, shifting to semantically richer targets yields substantial downstream improvements, particularly when paired with expressive Graph Transformer encoders. These insights offer practical guidance for developing more effective SSL methods for molecular graphs.
URL: https://openreview.net/forum?id=TE4vcYWRcc
---
Title: A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints
Abstract: Federated Learning has gained increasing attention for its ability to enable multiple nodes to collaboratively train machine learning models without sharing their raw data. At the same time, Generative AI—particularly Generative Adversarial Networks (GANs)—have achieved remarkable success across a wide range of domains, such as healthcare, security, and Image Generation. However, training generative models typically requires large datasets and significant computational resources, which are often unavailable in real-world settings. Acquiring such resources can be costly and inefficient, especially when many underutilized devices—such as IoT devices and edge devices—with varying capabilities remain idle. Moreover, obtaining large datasets is challenging due to privacy concerns and copyright restrictions, as most devices are unwilling to share their data. To address these challenges, we propose a novel approach for decentralized GAN training that enables the utilization of distributed data and underutilized, low-capability devices while not sharing data in its raw form. Our approach is designed to tackle key challenges in decentralized environments, combining KLD-weighted Clustered Federated Learning to address the issues of data heterogeneity and multi-domain datasets, with Heterogeneous U-Shaped split learning to tackle the challenge of device heterogeneity under strict data sharing constraints—ensuring that no labels or raw data, whether real or synthetic, are ever shared between nodes. Experimental results shows that our approach demonstrates consistent and significant improvements across key performance metrics, where it achieves 1.1×—2.2× higher image generation scores, an average 10% boost in classification metrics (up to 50% in multi-domain non-IID settings), in much lower latency compared to several benchmarks.
URL: https://openreview.net/forum?id=rpbL7pfPYH
---
Title: Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion
Abstract: We show that **d**ifferentially **p**rivate **f**ull **f**ine-**t**uning (DP-FFT) can distort pre-trained backbone features based on both theoretical and empirical results. We identify the cause of the distortion as the misalignment between the pre-trained backbone and the randomly initialized linear head. We prove that a sequential fine-tuning strategy can mitigate the feature distortion: first-linear-probing-then-fine-tuning (DP-LP-FFT). A new approximation scheme allows us to derive approximate upper and lower bounds on the training loss of DP-LP and DP-FFT, in a simple but canonical setting of 2-layer neural networks with ReLU activation. Experiments on real-world datasets and architectures are consistent with our theoretical insights. We also derive new upper bounds for 2-layer linear networks without the approximation. Moreover, our theory suggests a trade-off of privacy budget allocation in multi-phase fine-tuning methods like DP-LP-FFT.
URL: https://openreview.net/forum?id=LwT8aDv502
---
Title: Learning Time-Series Representations by Hierarchical Uniformity-Tolerance Latent Balancing
Abstract: We propose TimeHUT, a novel method for learning time-series representations by hierarchical uniformity-tolerance balancing of contrastive representations. Our method uses two distinct losses to learn strong representations with the aim of striking an effective balance between uniformity and tolerance in the embedding space. First, TimeHUT uses a hierarchical setup to learn both instance-wise and temporal information from input time-series. Next, we integrate a temperature scheduler within the vanilla contrastive loss to balance the uniformity and tolerance characteristics of the embeddings. Additionally, a hierarchical angular margin loss enforces instance-wise and temporal contrasts, creating geometric margins between positive and negative pairs of temporal sequences. This approach improves the coherence of positive pairs and their separation from the negatives, enhancing the capture of temporal dependencies within a time-series sample. We evaluate our approach on a wide range of tasks, namely 128 UCR and 30 UAE datasets for univariate and multivariate classification, as well as Yahoo and KPI datasets for anomaly detection. The results demonstrate that TimeHUT outperforms prior methods by considerable margins on classification, while obtaining competitive results for anomaly detection. Finally, detailed sensitivity and ablation studies are performed to evaluate different components and hyperparameters of our method.
URL: https://openreview.net/forum?id=NTmVEAiyB5
---
Title: Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning
Abstract: We propose Rec-R1, a general reinforcement learning framework that bridges large language models (LLMs) with recommendation systems through closed-loop optimization. Unlike prompting and supervised fine-tuning (SFT), Rec-R1 directly optimizes LLM generation using feedback from a fixed, black-box recommendation model—without relying on synthetic SFT data from proprietary models like GPT-4o. This avoids the substantial cost and effort required for data distillation. To verify the effectiveness of Rec-R1, we evaluate Rec-R1 on two representative tasks: product search and sequential recommendation. Experimental results demonstrate that Rec-R1 not only consistently outperforms prompting- and SFT-based methods, but also achieves remarkable gains over strong discriminative baselines, even when used with simple retrievers like BM25. More impressively, Rec-R1 preserves the general-purpose capabilities of the LLM, in contrast to SFT, which often impairs instruction-following and reasoning. These findings suggest Rec-R1 as a promising foundation for continual task-specific adaptation without catastrophic forgetting.
URL: https://openreview.net/forum?id=YBRU9MV2vE
---
Title: LOGLO-FNO: Efficient Learning of Local and Global Features in Fourier Neural Operators
Abstract: Modeling high-frequency information is a critical challenge in scientific machine learning. For instance, fully turbulent flow simulations of the Navier-Stokes equations at Reynolds numbers 3500 and above can generate high-frequency signals due to swirling fluid motions caused by eddies and vortices. Faithfully modeling such signals using neural networks depends on the accurate reconstruction of moderate to high frequencies. However, it has been well known that deep neural nets exhibit the so-called spectral or frequency bias towards learning low-frequency components. Meanwhile, Fourier Neural Operators (FNOs) have emerged as a popular class of data-driven models in recent years for solving Partial Differential Equations (PDEs) and for surrogate modeling in general. Although impressive results have been achieved on several PDE benchmark problems, FNOs often perform poorly in learning non-dominant frequencies characterized by local features. This limitation stems from the spectral bias inherent in neural networks and the explicit exclusion of high-frequency modes in FNOs and their variants. Therefore, to mitigate these issues and improve FNO's spectral learning capabilities to represent a broad range of frequency components, we propose two key architectural enhancements: (i) a parallel branch performing local spectral convolutions and (ii) a high-frequency propagation module. Moreover, we propose a novel frequency-sensitive loss term based on radially binned spectral errors. This introduction of a parallel branch for local convolutions reduces the number of trainable parameters by up to 50% while achieving the accuracy of the baseline FNO that relies solely on global convolutions. Moreover, our findings demonstrate that the proposed model improves the stability over longer rollouts. Experiments on three challenging PDE problems in fluid mechanics and biological pattern formation, and the qualitative and spectral analysis of predictions, show the effectiveness of our method over the state-of-the-art neural operator families of baselines.
URL: https://openreview.net/forum?id=MQ1dRdHTpi
---
Title: A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Abstract: Existing benchmarks for assessing the spatio-temporal understanding and reasoning abilities of video language models are susceptible to score inflation due to the presence of shortcut solutions based on superficial visual or textual cues. This paper mitigates the challenges in accurately assessing model performance by introducing the Minimal Video Pairs (MVP) benchmark, a simple shortcut-aware video QA benchmark for assessing the physical understanding of video language models. The benchmark is comprised of 55K high-quality multiple-choice video QA examples focusing on physical world understanding. Examples are curated from nine video data sources, spanning first-person egocentric and exocentric videos, robotic interaction data, and cognitive science intuitive physics benchmarks. To mitigate shortcut solutions that rely on superficial visual or textual cues and biases, each sample in MVP has a minimal-change pair — a visually similar video accompanied by an identical question but an opposing answer. To answer a question correctly, a model must provide correct answers for both examples in the minimal-change pair; as such, models that solely rely on visual or textual biases would achieve below random performance. Human performance on MVP is 92.9%, while the best open-source state-of-the- art video-language model achieves 40.2% compared to random performance at 25%.
URL: https://openreview.net/forum?id=gvFgNJcSw1
---
Title: Privacy Risks and Preservation Methods in Explainable Artificial Intelligence: A Scoping Review
Abstract: Explainable Artificial Intelligence (XAI) has emerged as a pillar of Trustworthy AI and aims to bring transparency in complex models that are opaque by nature. Despite the benefits of incorporating explanations in models, an urgent need is found in addressing the privacy concerns of providing this additional information to end users. In this article, we conduct a scoping review of existing literature to elicit details on the conflict between privacy and explainability. Using the standard methodology for scoping review, we extracted 57 articles from 1,943 studies published from January 2019 to December 2024. The review addresses 3 research questions to present readers with more understanding of the topic: (1) what are the privacy risks of releasing explanations in AI systems? (2) what current methods have researchers employed to achieve privacy preservation in XAI systems? (3) what constitutes a privacy preserving explanation? Based on the knowledge synthesized from the selected studies, we categorize the privacy risks and preservation methods in XAI and propose the characteristics of privacy preserving explanations to aid researchers and practitioners in understanding the requirements of XAI that is privacy compliant. Lastly, we identify the challenges in balancing privacy with other system desiderata and provide recommendations for achieving privacy preserving XAI. We expect that this review will shed light on the complex relationship of privacy and explainability, both being the fundamental principles of Trustworthy AI.
URL: https://openreview.net/forum?id=q9nykJfzku
---
Title: Benchmarking LLM Guardrails in Handling Multilingual Toxicity
Abstract: With the ubiquity of Large Language Models (LLMs), guardrails have become crucial to detect and defend against toxic content. However, with the increasing pervasiveness of LLMs in multilingual scenarios, their effectiveness in handling multilingual toxic inputs remains unclear. In this work, we introduce a comprehensive multilingual test suite, spanning seven datasets and over ten languages, to benchmark the performance of state-of-the-art guardrails. We also investigates the resilience of guardrails against recent jailbreaking techniques, and assess the impact of in-context safety policies and language resource availability on guardrails' performance. Our findings show that existing guardrails are still ineffective at handling multilingual toxicity and lack robustness against jailbreaking prompts. This work aims to identify the limitations of guardrails and to build a more reliable and trustworthy LLMs in multilingual scenarios. \textit{\textcolor{red}{Warning: This paper contains potentially harmful examples.
URL: https://openreview.net/forum?id=4FTnsccHAV
---
Title: On Calibration of Multilingual Question Answering LLMs
Abstract: Multilingual pre-trained Large Language Models (LLMs) are incredibly effective at Question Answering (QA), a core task in Natural Language Understanding, achieving high accuracies on several multilingual benchmarks. However, little is known about how well their confidences are calibrated. In this paper, we comprehensively benchmark the calibration of several multilingual LLMs (MLLMs) on a variety of QA tasks. We perform extensive experiments, spanning encoder-only, encoder-decoder, and decoder-only QA models (size varying from 110M to 7B parameters) and diverse languages, including both high- and low-resource ones. We study different dimensions of calibration in in-distribution, out-of-distribution, and cross-lingual transfer settings, and investigate strategies to improve it, including post-hoc methods and regularized fine-tuning. For decoder-only LLMs such as LlaMa2, we additionally find that in-context learning improves confidence calibration on multilingual data.
We also conduct several ablation experiments to study the effect of language distances, language corpus size, and model size on calibration, and how multilingual models compare with their monolingual counterparts for diverse tasks and languages. Our experiments suggest that the multilingual QA models are poorly calibrated for languages other than English and incorporating a small set of cheaply translated multilingual samples during fine-tuning/calibration effectively enhances the calibration performance.
URL: https://openreview.net/forum?id=4klghu2PTj
---
Title: Stacking Variational Bayesian Monte Carlo
Abstract: Approximate Bayesian inference for models with computationally expensive, black-box likelihoods poses a significant challenge, especially when the posterior distribution is complex. Many inference methods struggle to explore the parameter space efficiently under a limited budget of likelihood evaluations. Variational Bayesian Monte Carlo (VBMC) is a sample-efficient method that addresses this by building a local surrogate model of the log-posterior. However, its conservative exploration strategy, while promoting stability, can cause it to miss important regions of the posterior, such as distinct modes or long tails.
In this work, we introduce Stacking Variational Bayesian Monte Carlo (S-VBMC), a method that overcomes this limitation by constructing a robust, global posterior approximation from multiple independent VBMC runs. Our approach merges these local approximations through a principled and inexpensive post-processing step that leverages VBMC's mixture posterior representation and per-component evidence estimates. Crucially, S-VBMC requires no additional likelihood evaluations and is naturally parallelisable, fitting seamlessly into existing inference workflows. We demonstrate its effectiveness on two synthetic problems designed to challenge VBMC's exploration and two real-world applications from computational neuroscience, showing substantial improvements in posterior approximation quality across all cases.
URL: https://openreview.net/forum?id=M2ilYAJdPe
---
Title: BatchCP: Forecasting Time-Series Data That Have Change Points
Abstract: Many methods for time-series forecasting are known in classical statistics, such as autoregression, moving averages, and exponential smoothing. Some novel, recent approaches for time-series forecasting based on deep learning have shown very promising results already. However, time series often have change points, which can degrade the prediction performance substantially. This paper extends existing frameworks by detecting and including those change points. We show that our method, called BatchCP, performs as well as standard frameworks when there are no change points and considerably better when there are change points. More generally, we show that the batch size provides an effective and surprisingly simple way to deal with change points in architectures in modern forecasting models, such as DeepAR, Transformers, and TFTs.
URL: https://openreview.net/forum?id=GP2KNnb64z
---
Title: Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
Abstract: We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-to-image diffusion models. Traditional methods, like Diffusion-DPO, often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. In contrast, our approach begins with generating detailed critiques of synthesized images, from which we extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models.
URL: https://openreview.net/forum?id=9J3Na7cKek
---
Title: An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning
Abstract: Fine-tuning is an important step in adapting foundation models such as large language models to downstream tasks. To make this step more accessible to users with limited computational budgets, it is crucial to develop fine-tuning methods that are memory and computationally efficient. Sparse Fine-tuning (SpFT) and Low-rank adaptation (LoRA) are two frameworks that have emerged for addressing this problem and have been adopted widely in practice. In this work, we develop a new SpFT framework, based on ideas from neural network pruning. At a high level, we first identify ``important'' neurons/nodes using feature importance metrics from network pruning (specifically, we use the structural pruning method), and then perform fine-tuning by restricting to weights involving these neurons. Experiments on common language tasks show our method improves SpFT’s memory efficiency by 20–50% while matching the accuracy of state-of-the-art methods like LoRA’s variants.
URL: https://openreview.net/forum?id=w3b67v5EzD
---
Title: Holographic Quantum Neural Networks
Abstract: We introduce Holographic Quantum Neural Networks (HQNNs), a novel quantum machine learning architecture that leverages principles from holographic encoding and tensor networks to efficiently process high-dimensional quantum data. By embedding neural network operations within a holographic framework, HQNNs naturally implement multi-scale feature extraction while providing inherent error correction capabilities. We mathematically formalize the HQNN structure and prove its advantages in representational capacity, showing that HQNNs require only $\mathcal{O}(N_{\text{log}}\log N_{\text{log}})$ physical qubits to process $N_{\text{log}}$-qubit logical input states while tolerating error rates up to a threshold of $1-\frac{2}{z}$, where $z$ is the tensor network coordination number. Furthermore, we demonstrate how the geometric structure of HQNNs enables efficient learning of quantum data with hierarchical features, offering a promising approach for quantum machine learning in the noisy intermediate-scale quantum (NISQ) era and beyond.
URL: https://openreview.net/forum?id=YqOAvq3yCM
---
Title: Consistency Aware Robust Learning under Noisy Labels
Abstract: Deep neural networks (DNNs) often struggle with noisy supervision, a common challenge in real-world datasets where high-quality annotations are scarce. While DNNs tend to memorize noisy labels, the human brain excels at learning in noisy environments by modulating sensitivity to errors based on their magnitude and consistency. Inspired by this, we propose Consistency-Aware Robust Learning (CARoL), which maintains a memory of past predictions and errors to quantify consistency and guide the learning process. CARoL employs a principled mechanism to distinguish clean from noisy samples and modulates rate of adaptation based on prediction consistency. Furthermore, it integrates multiple learning pathways to fully utilize the dataset, adapting to sample characteristics as training progresses. Our empirical evaluation shows that CARoL achieves high precision in noisy label detection, enhances robustness, and performs reliably under severe noise, highlighting the potential of biologically inspired approaches for robust learning.
URL: https://openreview.net/forum?id=pZulfLkARr
---
Title: The Timeline of Meta-Reinforcement Learning - From the beginnings to the Adaptive Agent
Abstract: Humans are highly effective at utilizing prior knowledge to adapt to novel tasks, a capability standard machine learning models struggle to replicate due to their reliance on task-specific training. Meta-learning or ‘learning to learn’ overcomes this limitation by allowing models to acquire transferable knowledge from various tasks, enabling rapid adaptation to new challenges with minimal data. This survey presents a clear mathematical paradigm of meta-learning together with a formalization of common performance measures, distinguishes it from transfer-learning and multi-task learning, and utilizes it to derive the meta-reinforcement learning paradigm. A timeline of landmark meta-reinforcement learning developments from the earliest successes MAML and RL2 to the Adaptive Agent is provided along with the corresponding paradigms and training schemes. This way, this work offers a comprehensive foundation for understanding meta-learning and meta-reinforcement learning, before giving an outlook on the latest developments and the connection of meta-learning to the path towards general intelligence.
URL: https://openreview.net/forum?id=TG1QDSqTP1
---
Title: Is There a Better Source Distribution than Gaussian? Exploring Source Distributions for Image Flow Matching
Abstract: Flow matching has emerged as a powerful generative modeling approach with flexible source distribution choices. While Gaussian distributions are commonly used, the potential for better alternatives in high-dimensional data generation remains largely unexplored. In this paper, we propose a novel 2D simulation that captures high-dimensional geometric properties under the interpretable 2D setting, enabling us to analyze the learning dynamics of flow matching during training. Based on this analysis, we derive several key insights about flow matching behavior: (1) density approximation paradoxically degrades performance due to mode discrepancy, (2) directional alignment suffers from path entanglement when overly concentrated, (3) Gaussian's omnidirectional coverage ensures robust learning, and (4) norm misalignment incurs substantial learning costs. Building on these insights, we propose a practical framework that combines norm-aligned training with directionally-pruned sampling. This approach maintains robust omnidirectional supervision essential for stable flow learning, while eliminating data sparse-region initializations during inference. Importantly, our pruning strategy can be applied to any flow matching model trained with a Gaussian source, providing immediate performance gains without the need for retraining. Empirical evaluations demonstrate consistent improvements in both generation quality and sampling efficiency. Our findings provide practical insight and guidelines for source distribution design and introduce a readily applicable technique for improving existing flow matching models.
URL: https://openreview.net/forum?id=sev0GtV1fc
---
Title: Batch Entanglement Detection in Parameterized Qubit States using Classical Bandit Algorithms
Abstract: Entanglement is a key property of quantum states that acts as a resource for a wide range of tasks in quantum computing. Entanglement detection is a key conceptual and practical challenge. Without adaptive or joint measurements, entanglement detection is constrained by no-go theorems (Lu et al., 2016), necessitating full state tomography. Batch entanglement detection refers to the problem of identifying all entangled states from amongst a set of $K$ unknown states which finds applications in quantum information processing. We devise a method to perform batch entanglement detection by performing measurements derived from a single-parameter family of entanglement witnesses from Zhu et al. (2010), followed by a thresholding bandit algorithm on the measurement data. The proposed method can perform batch entanglement detection conclusively, when the unknown states are drawn from practically well-motivated class of two qubit states $\mathcal{F}$ that include Depolarised Bell states, Bell diagonal states etc. Our key novelty lies in drawing a connection between batch entanglement detection, and a Thresholding Bandit problem in classical Multi-Armed Bandits (MAB). The connection to the MAB problem also enables us to derive theoretical guarantees on the measurement/sample complexity of the proposed technique. We demonstrate the performance of the proposed method through numerical simulations and an experimental implementation. More broadly, this paper highlights the potential for employing classical machine learning techniques for quantum entanglement detection.
URL: https://openreview.net/forum?id=0v27eMBVZ0
---
Title: Quantifying Context Bias in Domain Adaptation for Object Detection
Abstract: Domain adaptation for object detection (DAOD) has become essential to counter performance degradation caused by distribution shifts between training and deployment domains. However, a critical factor influencing DAOD—context bias resulting from learned foreground-background (FG–BG) associations—has remained underexplored. In this work, we present the first comprehensive empirical and causal analysis specifically targeting context bias in DAOD. We address three key questions regarding FG-BG associations in object detection: (a) are FG-BG associations encoded during the training, (b) is there a causal relationship between FG-BG associations and detection performance, and (c) is there an effect of FG-BG association on DAOD. To examine how models capture FG–BG associations, we analyze class-wise and feature-wise performance degradation using background masking and feature perturbation, measured via change in accuracies (defined as drop rate). To explore the causal role of FG–BG associations, we apply do-calculus on FG–BG pairs guided by class activation mapping (CAM). To quantify the causal influence of FG–BG associations across domains, we propose a novel metric—domain association gradient—defined as the ratio of drop rate to maximum mean discrepancy (MMD). Through systematic experiments involving background masking, feature-level perturbations, and CAM, we reveal that convolution-based object detection models encode FG–BG associations. These associations substantially impact detection performance, particularly under domain shifts where background information significantly diverges. Our results demonstrate that context bias not only exists but causally undermines the generalization capabilities of object detection models across domains. Furthermore, we validate these findings across multiple models and datasets, including state-of-the-art architectures such as ALDI++.This study highlights the necessity of addressing context bias explicitly in DAOD frameworks, providing insights that pave the way for developing more robust and generalizable object detection systems.
URL: https://openreview.net/forum?id=YRU0A0nraG
---
Title: Mastering SAM Prompts: A Large-Scale Empirical Study in Segmentation Refinement
Abstract: Segment Anything Model (SAM) has emerged as a prevalent tool empowering advances in vision tasks from instance segmentation, panoptic segmentation, to interactive segmentation. Leveraging powerful zero-shot capabilities enabled by visual prompts such as masks placed on the image, SAM has been shown to significantly improve tasks. Yet, a poor prompt can worsen SAM performance, risking consequences such as misdiagnoses, autonomous driving failures, or manufacturing defects. However, recent studies on visual SAM prompting remain limited, cover only a small fraction of potential prompt configurations, adopt ad-hoc evaluation strategies, and come with limited or even no rigorous analysis of the statistical significance of prompt configurations. To address this gap, we undertake the first large-scale empirical study comprehensively evaluating the impact of SAM prompt configurations on segmentation refinement. This includes 2,688 prompt configurations, including points, boxes, and masks with diverse augmentations, on four initial segmentation models for a total of 10,752 evaluations. From these results, we draw statistically significant insights along with practical guidelines for prompt design. In particular, we recommend including a bounding box, which raised AP@50-95 by 0.320 and advise against using a coarse mask, which lowers AP@50-95 by -0.133 across all four models. We showcase that our recommended prompt configuration enables SAM to outperform leading refinement methods on multiple benchmark datasets.
URL: https://openreview.net/forum?id=cWcTQMpqv6
---
Title: Where are we with calibration under dataset shift in image classification?
Abstract: We conduct an extensive study on the state of calibration under real-world dataset shift for image classification. Our work provides important insights on the choice of post-hoc and in-training calibration techniques, and yields practical guidelines for all practitioners interested in robust calibration under shift. We compare various post-hoc calibration methods, and their interactions with common in-training calibration strategies (e.g., label smoothing), across a wide range of natural shifts, on eight different classification tasks across several imaging domains. We find that: (i) simultaneously applying entropy regularisation and label smoothing yield the best calibrated raw probabilities under dataset shift, (ii) post-hoc calibrators exposed to a small amount of semantic out-of-distribution data (unrelated to the task) are most robust under shift, (iii) recent calibration methods specifically aimed at increasing calibration under shifts do not necessarily offer significant improvements over simpler post-hoc calibration methods, (iv) improving calibration under shifts often comes at the cost of worsening in-distribution calibration. Importantly, these findings hold for randomly initialised classifiers, as well as for those finetuned from foundation models, the latter being consistently better calibrated compared to models trained from scratch. Finally, we conduct an in-depth analysis of ensembling effects, finding that (i) applying calibration prior to ensembling (instead of after) is more effective for calibration under shifts, (ii) for ensembles, OOD exposure deteriorates the ID-shifted calibration trade-off, (iii) ensembling remains one of the most effective methods to improve calibration robustness and, combined with finetuning from foundation models, yields best calibration results overall.
URL: https://openreview.net/forum?id=1NYKXlRU2H
---
Title: Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
Abstract: Large Language Models (LLMs) display striking surface fluency yet systematically fail at tasks requiring symbolic reasoning, arithmetic accuracy, and logical consistency. This paper offers a structural diagnosis of such failures, revealing a persistent gap between \textit{comprehension} and \textit{competence}. Through controlled experiments and architectural analysis, we demonstrate that LLMs often articulate correct principles without reliably applying them—a failure rooted not in knowledge access, but in computational execution. We term this phenomenon the computational \textit{split-brain syndrome}, where instruction and action pathways are geometrically and functionally dissociated. This core limitation recurs across domains, from mathematical operations to relational inferences, and explains why model behavior remains brittle even under idealized prompting. We argue that LLMs function as powerful pattern completion engines, but lack the architectural scaffolding for principled, compositional reasoning. Our findings delineate the boundary of current LLM capabilities and motivate future models with metacognitive control, principle lifting, and structurally grounded execution. This diagnosis also clarifies why mechanistic interpretability findings may reflect training-specific pattern coordination rather than universal computational principles, and why the geometric separation between instruction and execution pathways suggests limitations in neural introspection and mechanistic analysis.
URL: https://openreview.net/forum?id=Gz5HMiJLqv
---
Title: AEAP: A Reinforcement Learning Actor Ensemble Algorithm with Adaptive Pruning
Abstract: Actor ensemble reinforcement learning methods have shown promising performance on dense-reward continuous control tasks. However, they exhibit three primary limitations: (1) diversity collapse when using a shared replay buffer, often necessitating carefully tuned regularization terms;
(2) computational overhead from maintaining multiple actors; and (3) analytically intractable policy gradients when using stochastic policies in ensembles, requiring approximations that may compromise performance. To address this third limitation, we restrict the ensemble to deterministic policies and propose Actor Ensemble with Adaptive Pruning (AEAP), a multi-actor deterministic policy gradient algorithm that tackles the remaining limitations through a two-stage approach. First, to alleviate diversity collapse, AEAP employs dual-randomized actor selection that decorrelates exploration and learning by randomly choosing different actors for both environment interaction and policy update. This approach also removes reliance on explicit regularization. Second, when convergence to homogeneous policies still occurs over time, computational efficiency is further achieved through adaptive dual-criterion pruning, which progressively removes underperforming or redundant actors based on critic-estimated value and action-space similarity. Although AEAP introduces four additional hyperparameters compared to TD3 (a baseline single-actor deterministic policy gradient algorithm), we provide two domain-agnostic parameter configurations that perform robustly across environments without requiring tuning.
AEAP achieves superior or competitive asymptotic performance compared to baselines across six dense-reward MuJoCo tasks. On sparse-reward Fetch benchmarks, AEAP outperforms deterministic policy gradient methods but falls short of SAC (a baseline stochastic policy gradient algorithm) on one of three tasks. When compared to fixed-size multi-actor baselines, AEAP reduces wall-clock time without sacrificing performance, establishing it as an efficient and reliable actor ensemble variant.
URL: https://openreview.net/forum?id=I5ymMVdmaR
---
Title: Are Language Model Embeddings Sufficient for Bayesian Optimization?
Abstract: Bayesian Optimization is ubiquitous in experimental design and black-box optimization for improving search efficiency. However, most existing approaches rely on regression models which are limited to fixed search spaces and structured, tabular input features. This paper explores the use of LLM embeddings over string inputs for in-context regression in Bayesian Optimization. Our results show that representing inputs as strings enables general-purpose regression across diverse domains, including synthetic, combinatorial, and hyperparameter optimization. Furthermore, our approach achieves optimization performance comparable to state-of-the-art Gaussian Process-based methods such as Google Vizier, and demonstrates potential for broader and more flexible applications.
URL: https://openreview.net/forum?id=sqnKQ96Uu9
---
Title: Minimizing Self-Intersections of 3-dimensional Immersions of 5-dimensional Cubical Surfaces with Reinforcement Learning
Abstract: A closed cubical surface is a 2-dimensional cubical complex (analogous to a simplicial complex but with a cubical structure) such that each of its points has an open neighborhood homeomorphic to a disk. Aveni et.al. proved that up to isomorphism 2690 connected closed cubical surfaces can be built from the faces of a $5$-cube (sometimes called a penteract) and give a classification for closed orientable and non-orientable cubical surfaces. It is well known that non-orientable surfaces (of any kind) cannot be embedded in $\mathbb{R}^3$; their immersion will always have some self-intersection and in the context of cubical surfaces this also seems to be the case for some orientable surfaces. Therefore, given a cubical surface it is natural to ask: What is the smallest number of self-intersections it can have for any immersion in $\mathbb{R}^3$ using perspective projection and without deforming the $5$-cube? Given an initial immersion, can we calculate a sequence of $5$-dimensional rotations or perspective projections step-wise minimizing self-intersections efficiently? These questions are addressed using Reinforcement Learning and animation sequences are created to visualize the minimization strategies found by the agent.
URL: https://openreview.net/forum?id=aT6VkKtM5U
---
Title: Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models
Abstract: Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities for capturing and reasoning over multimodal inputs. However, these models are prone to parametric knowledge conflicts, which arise from inconsistencies of represented knowledge between their vision and language components. In this paper, we formally define the problem of cross-modality parametric knowledge conflict and present a systematic approach to detect, interpret, and mitigate them. We introduce a pipeline that identifies conflicts between visual and textual answers, showing a persistently high conflict rate across modalities in recent LVLMs regardless of the model size. We further investigate how these conflicts interfere with the inference process and propose a contrastive metric to discern the conflicting samples from the others. Building on these insights, we develop a novel dynamic contrastive decoding method that removes undesirable logits inferred from the less confident modality components based on answer confidence. For models that do not provide logits, we also introduce two prompt-based strategies to mitigate the conflicts. Our methods achieve promising improvements in accuracy on both the ViQuAE and InfoSeek datasets. Specifically, using LLaVA-34B, our proposed dynamic contrastive decoding improves an average accuracy of 2.24%.
URL: https://openreview.net/forum?id=uVjdsfo0Mq
---
Title: Multiplayer Information Asymmetric Bandits in Metric Spaces
Abstract: In recent years the information asymmetric Lipschitz bandits In this paper we studied the Lipschitz bandit problem applied to the multiplayer information asymmetric problem studied in \cite{chang2022online, chang2023optimal}. More specifically we consider information asymmetry in rewards, actions, or both. We adopt the CAB algorithm given in \cite{kleinberg2004nearly} which uses a fixed discretization to give regret bounds of the same order (in the dimension of the action) space in all 3 problem settings. We also adopt their zooming algorithm \cite{ kleinberg2008multi}which uses an adaptive discretization and apply it to information asymmetry in rewards and information asymmetry in actions.
URL: https://openreview.net/forum?id=QoY4tnAU5g
---
Title: Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Abstract: Network pruning focuses on algorithms that aim to reduce a given model's computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the last decade, the most widely used pruning paradigm has been pruning and re-training, which nowadays is inconvenient due to the vast amount of pre-trained models, which are, in any case, too expensive to re-train. In this paper, we exploit functional information from dense pre-trained models, i.e., their input activations, to obtain sparse models that maximize the activations' alignment with respect to their corresponding dense models. Hence, we propose \algname, a \emph{top-up} algorithm that can be used on top of any given pruning algorithm for LLMs, which modifies the block-wise and row-wise sparsity, exploiting information from both the dense model and its sparse version to maximize the \emph{neuron alignment} among activations. Different from existing methods, our approach adaptively selects the best hyperparameters for the block-wise and row-wise sparsity ratios w.r.t. the model and the desired sparsity, and requires \emph{no re-training}. We test our method over $\sim$300 test cases with four LLM families, three sparsity ratios, and ten language tasks (three language modeling and seven zero-shot datasets), showing how it consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off.
URL: https://openreview.net/forum?id=uPyNaNqFK2
---
Title: Analysis of generalization capacities of Neural Ordinary Differential Equations
Abstract: Neural ordinary differential equations (neural ODEs) represent a widely used class of deep learning models characterized by continuous depth. Understanding the generalization error bound is important to evaluate how well a model is expected to perform on new, unseen data. Earlier works in this direction involved considering the linear case on the dynamics function (a function that models the evolution of state variables) of Neural ODE Marion (2023). Other related work is on bound for Neural Controlled ODE Bleistein & Guilloux (2023) that depends on the sampling gap. We consider a class of neural ordinary differential equations (ODEs) with a general nonlinear function for time-dependent and time-independent cases which is Lipschitz with respect to state variables. We observed that the solution of the neural ODEs would be of bounded variations if we assume that the dynamics function of Neural ODEs is Lipschitz continuous with respect to the hidden state. We derive a generalization bound for the time-dependent and time-independent Neural ODEs. We showed the effect of overparameterization and domain bound in the generalization error bound. This is the first time, the generalization bound for the Neural ODE with a general non-linear function has been found.
URL: https://openreview.net/forum?id=CxW6TF1rOF
---
Title: Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift
Abstract: We consider the problem of learning robust discriminative representations of latent variables that are causally related to each other via a directed graph. In addition to passively collected observational data, the training dataset also includes interventional data obtained through targeted interventions on some of these latent variables to learn representations that are robust against the resulting interventional distribution shifts. However, existing approaches treat interventional data like observational data, even when the underlying causal model is known, and ignore the independence relations that arise from these interventions. Since these approaches do not fully exploit the causal relational information resulting from interventions, they learn representations that produce large disparities in predictive performance on observational and interventional data. This performance disparity worsens when the number of interventional data samples available for training is limited. In this paper, (1) we first identify a strong correlation between this performance disparity and adherence of the representations to the statistical independence conditions induced by the underlying causal model during interventions. (2) For linear models, we derive sufficient conditions on the proportion of interventional data in the training dataset, for which enforcing statistical independence between representations corresponding to the intervened node and its non-descendants during interventions lowers the test-time error on interventional data. Combining these insights, (3) we propose RepLIn, a training algorithm to explicitly enforce this statistical independence during interventions. We demonstrate the utility of RepLIn on a synthetic dataset and on real image and text datasets on facial attribute classification and toxicity detection, respectively. Our experiments show that RepLIn is scalable with the number of nodes in the causal graph and is suitable to improve the robustness of representations against interventional distribution shifts of both continuous and discrete latent variables compared to the ERM baselines.
URL: https://openreview.net/forum?id=kXfcEyNIrf
---
Title: StructFormer: Document Structure-based Masked Attention and its impact on Language Model Pre-Training
Abstract: Most state-of-the-art techniques for Language Models (LMs) today rely on transformer-based architectures and their ubiquitous attention mechanism. However, the exponential growth in computational requirements with longer input sequences confines Transformers to handling short passages. Recent efforts have aimed to address this limitation by introducing selective attention mechanisms, notably local and global attention. While sparse attention mechanisms, akin to full attention in being Turing-complete, have been theoretically established, their practical impact on pre-training remains unexplored. This study focuses on empirically assessing the influence of global attention on BERT pre-training.
The primary steps involve creating an extensive corpus of structure-aware text through arXiv data, alongside a text-only counterpart. We carry out pre-training on these two datasets, investigate shifts in attention patterns, and assess their implications for downstream tasks. Our analysis underscores the significance of incorporating document structure into LM models, demonstrating their capacity to excel in more abstract tasks, such as document understanding.
URL: https://openreview.net/forum?id=SATuB4XEMa
---
Title: Doubly residual transitions for deep variational state-space models
Abstract: Sequential data modeling often relies on capturing underlying dynamics through Variational State-Space Models (VRSSMs), yet the architecture of transition functions in these models remains underexplored. Here we investigate highway layers as latent transitions in VRSSMs, leveraging their trainable gating mechanisms that allow flexible combination of raw and transformed representations. Through extensive empirical evaluation across multiple datasets, we demonstrate that highway transitions consistently outperform standard multi-layer perceptron (MLP) baselines. Our results show that highway-based VRSSMs achieve better validation performance while demonstrating enhanced robustness to hyperparameter choices. The findings highlight how established neural network techniques can significantly impact probabilistic sequential modeling when applied in new contexts. We recommend that practitioners incorporate highway connections in their modeling toolbox for VRSSMs, as they provide a simple yet effective architectural enhancement for capturing temporal dependencies in sequential data.
URL: https://openreview.net/forum?id=snRypaYihO
---
Title: Meta-Learning Adaptive Loss Functions
Abstract: Loss function learning is a new meta-learning paradigm that aims to automate the essential task of designing a loss function for a machine learning model. Existing techniques for loss function learning have shown promising results, often improving a model's training dynamics and final inference performance. However, a significant limitation of these techniques is that the loss functions are meta-learned in an offline fashion, where the meta-objective only considers the very first few steps of training, which is a significantly shorter time horizon than the one typically used for training deep neural networks. This causes significant bias towards loss functions that perform well at the very start of training but perform poorly at the end of training. To address this issue we propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters. The experimental results show that our proposed method consistently outperforms the cross-entropy loss and offline loss function learning techniques on a diverse range of neural network architectures and datasets.
URL: https://openreview.net/forum?id=o0ODnN0xz8
---
Title: Generalized Smooth Stochastic Variational Inequalities: Almost Sure Convergence and Convergence Rates
Abstract: This paper focuses on solving a stochastic variational inequality (SVI) problem under relaxed smoothness assumption for a class of structured non-monotone operators. The SVI problem has attracted significant interest in the machine learning community due to its immediate application to adversarial training and multi-agent reinforcement learning. In many such applications, the resulting operators do not satisfy the smoothness assumption. To address this issue, we focus on a weaker generalized smoothness assumption called $\alpha$-symmetric. Under $p$-quasi sharpness and $\alpha$-symmetric assumptions on the operator, we study clipped projection (gradient descent-ascent) and clipped Korpelevich (extragradient) methods. For these clipped methods, we provide the first almost-sure convergence results without making any assumptions on the boundedness of either the stochastic operator or the stochastic samples. We also provide the first in-expectation unbiased convergence rate results for these methods under a relaxed smoothness assumption.
URL: https://openreview.net/forum?id=EjqSpbUBWU
---
Title: Efficient Knowledge Distillation via Salient Feature Masking
Abstract: Traditional Knowledge Distillation (KD) transfers all outputs from a teacher model to a student model, often introducing knowledge redundancy. This redundancy dilutes critical information, leading to degraded student model performance. To address this, we propose Salient Feature Masking for Knowledge Distillation (SFKD), where only the most informative features are selectively distilled, enhancing student performance. Our approach is grounded in the Information Bottleneck (IB) principle, where focusing on features with higher mutual information with the input leads to more effective distillation. SFKD integrates with existing KD variants and enhances the transfer of ``dark knowledge''. It consistently improves image classification accuracy across diverse models, including ConvNeXt and ViT, achieving gains of 5.44\% on CIFAR-100 and 3.57\% on ImageNet-1K. When combined with current KD methods, SFKD outperforms state-of-the-art results by 1.47\%.
URL: https://openreview.net/forum?id=HJCyk6vBYL
---
Title: Learning few-step posterior samplers by unfolding and distillation of diffusion models
Abstract: Diffusion models (DMs) have emerged as powerful image priors in Bayesian computational imaging. Two primary strategies have been proposed for leveraging DMs in this context: Plug-and-Play methods, which are zero-shot and highly flexible but rely on approximations; and specialized conditional DMs, which achieve higher accuracy and faster inference for specific tasks through supervised training. In this work, we introduce a novel framework that integrates deep unfolding and model distillation to transform a DM image prior into a few-step conditional model for posterior sampling. A central innovation of our approach is the unfolding of a Markov chain Monte Carlo (MCMC) algorithm—specifically, the recently proposed LATINO Langevin sampler (Spangnoletti et al., 2025)—representing the first known instance of deep unfolding applied to a Monte Carlo sampling scheme. We demonstrate our proposed unfolded and distilled samplers through extensive experiments and comparisons with the state of the art, where they achieve excellent accuracy and computational efficiency, while retaining the flexibility to adapt to variations in the forward model at inference time.
URL: https://openreview.net/forum?id=oGCfD8YKN2
---
Title: Accelerated Training on Low-Power Edge Devices
Abstract: Training on edge devices poses several challenges as these devices are generally resource-constrained, especially in terms of power.
State-of-the-art techniques at the device level reduce the GPU frequency to enforce power constraints, leading to a significant increase in training time. To accelerate training, we propose to jointly adjust the system and application parameters (in our case, the GPU frequency and the batch size of the training task) while adhering to the power constraints on devices. We introduce a novel cross-layer methodology that combines predictions of batch size efficiency and device profiling to achieve the desired optimization. Our evaluation on real hardware shows that our method outperforms the current baselines that depend on state of the art techniques, reducing the training time by up to $2.3\times$ with results very close to optimal. Our measurements also indicate a substantial reduction in the overall energy used for the training process. These gains are achieved without reduction in the performance of the trained model.
URL: https://openreview.net/forum?id=cGjQ41jBEn
---
Title: Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation
Abstract: The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To achieve parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. Importantly, the pruning order of the parameters is used to derive a global score map that allows compressing a model to any target size without re-computation. We evaluate ACIP on a large selection of open-weight LLMs and downstream tasks, demonstrating state-of-the-art results compared to existing factorization-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques.
URL: https://openreview.net/forum?id=Y6hdYf8tsg
---
Title: Decoupling Generalizability and Membership Privacy Risks in Neural Networks
Abstract: A deep learning model usually has to sacrifice some utilities when it acquires some other abilities or characteristics. Privacy preservation has such trade-off relationships with utilities. The loss disparity between various defense approaches implies the potential to decouple generalizability and privacy risks to maximize privacy gain. In this paper, we identify that the model’s generalization and privacy risks exist in different regions in deep neural network architectures. Based on the observations that we investigate, we propose Privacy-Preserving
Training Principle (PPTP) to protect model components from privacy risks while minimizing the loss in generalizability. Through extensive evaluations, our approach shows significantly better maintenance in model generalizability while enhancing privacy preservation.
URL: https://openreview.net/forum?id=fWndEb4ltW
---
Title: Gradient GA: Gradient Genetic Algorithm For Drug Molecular Design
Abstract: Molecular discovery has brought great benefit to the chemical industry. Various molecular design techniques have been developed to identify molecules with desirable properties. Traditional optimization methods, such as genetic algorithms, continue to achieve state-of-the-art results across various molecular design benchmarks. However, these techniques rely solely on undirected random exploration, which hinders both the quality of the final solution and the convergence speed.
To address this limitation, we propose a novel approach called Gradient Genetic Algorithm (Gradient GA), which incorporates gradient information from the objective function into genetic algorithms. Instead of random exploration, each proposed sample iteratively progresses toward an optimal solution by following the gradient direction. We achieve this by designing a differentiable objective function parameterized by a neural network and utilizing the Discrete Langevin Proposal to enable gradient guidance in discrete molecular spaces.
Experimental results demonstrate that our method significantly improves both convergence speed and solution quality, outperforming cutting-edge techniques. The proposed method has shown up to a $25\%$ improvement in the Top 10 score over the vanilla genetic algorithm. The code is publicly available at https://anonymous.4open.science/r/GradientGA-DC45.
URL: https://openreview.net/forum?id=kFKcktAeEG
---
Title: FedLOE: Federated Domain Generalization via Locally Overfit Ensemble
Abstract: In federated learning (FL), clients typically access data from just one distribution. Ideally, the learned models would generalize to out-of-distribution (OOD) data, i.e., domain generalization (DG). However, centralized DG methods cannot easily be adapted in the domain separation context and prior federated DG methods perform poorly when the number of clients is large. To address these challenges, we revisit the classic mixture-of-experts (MoE) idea by viewing each client as an expert on its own dataset. From this perspective, simple federated averaging can be seen as a type of iterative MoE, where the amount of local training determines the strength of each expert.
Contrast to the FL communication-performance trade-off, we theoretically demonstrate that in linear cases and empirically validate in deep models that reducing communication frequency can effectively enhance DG performance, surpassing their centralized counterparts (e.g., $+4.34\%$ on PACS). Building on this, we further propose an additional MoE strategy to combine the client-specific classifier heads via standard DG objectives. Our proposed FedLOE method can be viewed as an intermediate approach between FedAVG and one-time ensembling. It demonstrates both theoretical soundness and empirical effectiveness. Moreover, FedLOE requires fewer communication rounds, highlighting its practical efficiency and scalability.
URL: https://openreview.net/forum?id=W4T9sK6Gai
---
Title: Adaptive Group Robust Ensemble Knowledge Distillation
Abstract: Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex teacher model to a relatively ``simple'' student model. Prior work has shown that ensemble deep learning methods can improve the performance of the worst-case subgroups; however, it is unclear if this advantage carries over when distilling knowledge from an ensemble of teachers, especially when the teacher models are debiased. This study demonstrates that traditional ensemble knowledge distillation can significantly drop the performance of the worst-case subgroups in the distilled student model even when the teacher models are debiased. To overcome this, we propose Adaptive Group Robust Ensemble Knowledge Distillation (AGRE-KD), a simple ensembling strategy to ensure that the student model receives knowledge beneficial for unknown underrepresented subgroups. Leveraging an additional biased model, our method selectively chooses teachers whose knowledge would better improve the worst-performing subgroups by upweighting the teachers with gradient directions deviating from the biased model. Our experiments on several datasets demonstrate the superiority of the proposed ensemble distillation technique and show that it can even outperform classic model ensembles based on majority voting.
URL: https://openreview.net/forum?id=G2BEBaKd8Y
---