Weekly TMLR digest for Mar 30, 2025

29 views

Skip to first unread message

TMLR

unread,

Mar 30, 2025, 12:00:12 AM3/30/25

to tmlr-annou...@googlegroups.com

Accepted papers
===============

Title: Simulation-based Bayesian Inference from Privacy Protected Data

Authors: Yifei Xiong, Nianqiao Ju, Sanguo Zhang

Abstract: Many modern statistical analysis and machine learning applications require training models on sensitive user data. Under a formal definition of privacy protection, differentially private algorithms inject calibrated noise into the confidential data or during the data analysis process to produce privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid statistical inferences. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we adopt neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.

URL: https://openreview.net/forum?id=SB7JzhDG45

---

Title: Illustrated Landmark Graphs for Long-horizon Policy Learning

Authors: Christopher Watson, Arjun Krishna, Rajeev Alur, Dinesh Jayaraman

Abstract: Applying learning-based approaches to long-horizon sequential decision-making tasks requires a human teacher to carefully craft reward functions or curate demonstrations to elicit desired behaviors. To simplify this, we first introduce an alternative form of task-specification, Illustrated Landmark Graph (ILG), that represents the task as a directed graph where each vertex corresponds to a region of the state space (a landmark), and each edge represents an easier to achieve sub-task. A landmark in the ILG is conveyed to the agent through a few illustrative examples grounded in the agent’s observation space. Second, we propose ILG-Learn, a human in the loop algorithm that interleaves planning over the ILG and sub-task policy learning. ILG-Learn adaptively plans through the ILG by relying on the human teacher’s feedback to estimate the success rates of learned policies. We conduct experiments on long-horizon block stacking and point maze navigation tasks, and find that our approach achieves considerably higher success rates (~ 50% improvement) compared to hierarchical reinforcement learning and imitation learning baselines. Additionally, we highlight how the flexibility of the ILG specification allows the agent to learn a sequence of sub-tasks that is better suited to its limited capabilities.

URL: https://openreview.net/forum?id=0AOUWC4ss8

---

Title: Adaptive Incentive Design for Markov Decision Processes with Unknown Rewards

Authors: Haoxiang Ma, Shuo Han, Ahmed Hemida, Charles A kamhoua, Jie Fu

Abstract: Incentive design, also known as model design or environment design for Markov decision processes(MDPs), refers to a class of problems in which a leader can incentivize his follower by modifying the follower's reward function, in anticipation that the follower's optimal policy in the resulting MDP can be desirable for the leader's objective. In this work, we propose gradient-ascent algorithms to compute the leader's optimal incentive design, despite the lack of knowledge about the follower's reward function.
First, we formulate the incentive design problem as a bi-level optimization problem and demonstrate that, by the softmax temporal consistency between the follower's policy and value function, the bi-level optimization problem can be reduced to single-level optimization, for which a gradient-based algorithm can be developed to optimize the leader's objective. We establish several key properties of incentive design in MDPs and prove the convergence of the proposed gradient-based method.
Next, we show that the gradient terms can be estimated from observations of the follower's best response policy, enabling the use of a stochastic gradient-ascent algorithm to compute a locally optimal incentive design without knowing or learning the follower's reward function. Finally, we analyze the conditions under which an incentive design remains optimal for two different rewards which are policy invariant. The effectiveness of the proposed algorithm is demonstrated using a small probabilistic transition system and a stochastic gridworld.

URL: https://openreview.net/forum?id=Rwf31BYTAU

---

Title: Influence Learning in Complex Systems

Authors: Elena Congeduti, roberto rocchetta, Frans A Oliehoek

Abstract: High sample complexity hampers the successful application of reinforcement learning methods, especially in real-world problems where simulating complex dynamics is computationally demanding. Influence-based abstraction (IBA) was proposed to mitigate this issue by breaking down the global model of large-scale distributed systems, such as traffic control problems, into small local sub-models. Each local model includes only a few state variables and a representation of the influence exerted by the external portion of the system. This approach allows converting a complex simulator into local lightweight simulators, enabling more effective applications of planning and reinforcement learning methods. However, the effectiveness of IBA critically depends on the ability to accurately approximate the influence of each local model. While there are a few examples showing promising results in benchmark problems, the question of whether this approach is feasible in more practical scenarios remains open. In this work, we take steps towards addressing this question by conducting an extensive empirical study of learning models for influence approximations in various realistic domains, and evaluating how these models generalize over long horizons. We find that learning the influence is often a manageable learning task, even for complex and large systems. Additionally, we demonstrate the efficacy of the approximation models for long-horizon problems. By using short trajectories, we can learn accurate influence approximations for much longer horizons.

URL: https://openreview.net/forum?id=tUnyInYbjK

---

Title: An Information Theoretic Approach to Machine Unlearning

Authors: Jack Foster, Kyle Fogarty, Stefan Schoepf, Zack Dugue, Cengiz Oztireli, Alexandra Brintrup

Abstract: To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten. We explore unlearning from an information theoretic perspective, connecting the influence of a sample to the information gain a model receives by observing it. From this, we derive a simple but principled zero-shot unlearning method based on the geometry of the model. Our approach takes the form of minimising the gradient of a learned function with respect to a small neighbourhood around a target forget point. This induces a smoothing effect, causing forgetting by moving the boundary of the classifier. We explore the intuition behind why this approach can jointly unlearn forget samples while preserving general model performance through a series of low-dimensional experiments. We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method is competitive with state-of-the-art performance under the strict constraints of zero-shot unlearning.

URL: https://openreview.net/forum?id=t1utIThKHD

---

Title: Emergent representations in networks trained with the Forward-Forward algorithm

Authors: Niccolo Tosato, Lorenzo Basile, Emanuele Ballarin, Giuseppe De Alteriis, Alberto Cazzaniga, Alessio ansuini

Abstract: The Backpropagation algorithm has often been criticised for its lack of biological realism. In an attempt to find a more biologically plausible alternative, the recently introduced Forward-Forward algorithm replaces the forward and backward passes of Backpropagation with two forward passes. In this work, we show that the internal representations obtained by the Forward-Forward algorithm can organise into category-specific ensembles exhibiting high sparsity -- composed of a low number of active units. This situation is reminiscent of what has been observed in cortical sensory areas, where neuronal ensembles are suggested to serve as the functional building blocks for perception and action. Interestingly, while this sparse pattern does not typically arise in models trained with standard Backpropagation, it can emerge in networks trained with Backpropagation on the same objective proposed for the Forward-Forward algorithm.

URL: https://openreview.net/forum?id=JhYbGiFn3Y

---

Title: What’s Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias

Authors: Aida Mohammadshahi, Yani Ioannou

Abstract: Knowledge Distillation is a commonly used Deep Neural Network (DNN) compression method, which often maintains overall generalization performance. However, we show that even for balanced image classification datasets, such as CIFAR-100, Tiny ImageNet and ImageNet, as many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy (i.e. class bias) between a teacher/distilled student or distilled student/non-distilled student model. Changes in class bias are not necessarily an undesirable outcome when considered outside of the context of a model’s usage. Using two common fairness metrics, Demographic Parity Difference (DPD) and Equalized Odds Difference (EOD) on models trained with the CelebA, Trifeature, and HateXplain datasets, our results suggest that increasing the distillation temperature improves the distilled student model’s fairness, and the distilled student fairness can even surpass the fairness of the teacher model at high temperatures. Additionally, we examine individual fairness, ensuring similar instances receive similar predictions. Our results confirm that higher temperatures also improve the distilled student model’s individual fairness. This study highlights the uneven effects of distillation on certain classes and its potentially significant role in fairness, emphasizing that caution is warranted when using distilled models for sensitive application domains.

URL: https://openreview.net/forum?id=xBbj46Y2fN

---

Title: Robust Symbolic Regression for Dynamical System Identification

Authors: Ramzi Dakhmouche, Ivan Lunati, Hossein Gorji

Abstract: Real-world complex systems often miss high-fidelity physical descriptions and are typically subject to partial observability. Learning the dynamics of such systems is a challenging and ubiquitous problem, encountered in diverse critical applications which require interpretability and qualitative guarantees.Our paper addresses this problem in the case of sparsely observed probability distribution flows, governed by ODEs. Specifically, we devise a {\it white box} approach -dubbed Symbolic Distribution Flow Learner (\texttt{SDFL})- leveraging symbolic search with a Wasserstein-based loss function, resulting in a robust model-recovery scheme which naturally lends itself to cope with partial observability.
Additionally, we furnish the proposed framework with theoretical guarantees on the number of required {\it snapshots} to achieve a certain level of fidelity in the model-discovery.
We illustrate the performance of the proposed scheme on the prototypical problem of Kuramoto networks and a standard benchmark of single-cell RNA sequence trajectory data. The numerical experiments demonstrate the competitive performance of \texttt{SDFL} in comparison to the state-of-the-art.

URL: https://openreview.net/forum?id=ZfPbCFZQbx

---

Title: Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation

Authors: Niccolò Avogaro, Thomas Frick, Mattia Rigotti, Andrea Bartezzaghi, Filip Janicki, A. Cristiano I. Malossi, Konrad Schindler, Roy Assaf

Abstract: Large Vision-Language Models (VLMs) are increasingly being regarded as foundation models that can be instructed to solve diverse tasks by prompting, without task-specific training.
We examine the seemingly obvious question: \emph{how to effectively prompt VLMs for semantic segmentation}.
To that end, we systematically evaluate the segmentation performance of several recent models guided by either text or visual prompts on the out-of-distribution MESS dataset collection.
We introduce a scalable prompting scheme, \emph{few-shot prompted semantic segmentation}, inspired by open-vocabulary segmentation and few-shot learning.
It turns out that VLMs lag far behind specialist models trained for a specific segmentation task, by about 30\% on average on the Intersection-over-Union metric.
Moreover, we find that text prompts and visual prompts are complementary: each one of the two modes fails on many examples that the other one can solve.
Our analysis suggests that being able to anticipate the most effective prompt modality can lead to a 11\% improvement in performance.
Motivated by our findings, we propose PromptMatcher, a remarkably simple training-free baseline that combines both text and visual prompts, achieving state-of-the-art results outperforming the best text-prompted VLM by 2.5\%, and the top visual-prompted VLM by 3.5\% on few-shot prompted semantic segmentation.

URL: https://openreview.net/forum?id=0yPWtbR3MC

---

Title: Probabilistic neural operators for functional uncertainty quantification

Authors: Christopher Bülte, Philipp Scholl, Gitta Kutyniok

Abstract: Neural operators aim to approximate the solution operator of a system of differential equations purely from data. They have shown immense success in modeling complex dynamical systems across various domains. However, the occurrence of uncertainties inherent in both model and data has so far rarely been taken into account\textemdash{}a critical limitation in complex, chaotic systems such as weather forecasting. In this paper, we introduce the probabilistic neural operator (PNO), a framework for learning probability distributions over the output function space of neural operators. PNO extends neural operators with generative modeling based on strictly proper scoring rules, integrating uncertainty information directly into the training process. We provide a theoretical justification for the approach and demonstrate improved performance in quantifying uncertainty across different domains and with respect to different baselines. Furthermore, PNO requires minimal adjustment to existing architectures, shows improved performance for most probabilistic prediction tasks, and leads to well-calibrated predictive distributions and adequate uncertainty representations even for long dynamical trajectories. Implementing our approach into large-scale models for physical applications can lead to improvements in corresponding uncertainty quantification and extreme event identification, ultimately leading to a deeper understanding of the prediction of such surrogate models.

URL: https://openreview.net/forum?id=gangoPXSRw

---

Title: Decision-Focused Surrogate Modeling for Mixed-Integer Linear Optimization

Authors: Shivi Dixit, Rishabh Gupta, Qi Zhang

Abstract: Mixed-integer optimization is at the core of many online decision-making systems that demand frequent updates of decisions in real time. However, due to their combinatorial nature, mixed-integer linear programs (MILPs) can be difficult to solve, rendering them often unsuitable for time-critical online applications. To address this challenge, we develop a data-driven approach for constructing surrogate optimization models in the form of linear programs (LPs) that can be solved much more efficiently than the corresponding MILPs. We train these surrogate LPs in a decision-focused manner such that for different model inputs, they achieve the same or close to the same optimal solutions as the original MILPs. One key advantage of the proposed method is that it allows the incorporation of all of the original MILP’s linear constraints, which significantly increases the likelihood of obtaining feasible predicted solutions. Results from two computational case studies indicate that this decision-focused surrogate modeling approach is highly data-efficient and provides very accurate predictions of the optimal solutions. In these examples, the resulting surrogate LPs outperform state-of-the-art neural-network-based optimization proxies.

URL: https://openreview.net/forum?id=A6tOXkkE4Z

---

Title: A Vector Bernstein Inequality for Self-Normalized Martingales

Authors: Ingvar Ziemann

Abstract: We prove a Bernstein inequality for vector-valued self-normalized martingales. We first give an alternative perspective of the corresponding sub-Gaussian bound due to Abbasi-Yadkori et al. via a PAC-Bayesian argument with Gaussian priors. By instantiating this argument to priors drawn uniformly over well-chosen ellipsoids, we obtain a Bernstein bound.

URL: https://openreview.net/forum?id=4ZJjr9YbBw

---

Title: Long-context LLMs Struggle with Long In-context Learning

Authors: Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

Abstract: Large Language Models (LLMs) have made significant strides in handling long sequences. Some models like Gemini could even be capable of dealing with millions of tokens. However, their performance evaluation has largely been confined to metrics like perplexity and synthetic tasks, which may not fully capture their true abilities in more challenging, real-world scenarios. We introduce a benchmark (LongICLBench) for long in-context learning in extreme-label classification using six datasets with 28 to 174 classes and input lengths from 2K to 50K tokens. Our benchmark requires LLMs to comprehend the entire input to recognize the massive label spaces to make correct predictions. We evaluate on 15 long-context LLMs and find that they perform well on less challenging classification tasks with smaller label space and shorter demonstrations. However, they struggle with more challenging task like Discovery with 174 labels, suggesting a gap in their ability to process long, context-rich sequences. Further analysis reveals a bias towards labels presented later in the sequence and a need for improved reasoning over multiple pieces of information. Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs. We believe LongICLBench could serve as a more realistic evaluation for the future long-context LLMs.

URL: https://openreview.net/forum?id=Cw2xlg0e46

---

Title: Meta-learning Population-based Methods for Reinforcement Learning

Authors: Johannes Hog, Raghu Rajan, André Biedenkapp, Noor Awad, Frank Hutter, Vu Nguyen

Abstract: Reinforcement learning (RL) algorithms are highly sensitive to their hyperparameter settings. Recently, numerous methods have been proposed to dynamically optimize these hyperparameters. One prominent approach is Population-Based Bandits (PB2), which uses time-varying Gaussian processes (GP) to dynamically optimize hyperparameters with a population of parallel agents. Despite its strong overall performance, PB2 experiences slow starts due to the GP initially lacking sufficient information. To mitigate this issue, we propose four different methods that utilize meta-data from various environments. These approaches are novel in that they adapt meta-learning methods to accommodate the time-varying setting. Among these approaches, MultiTaskPB2, which uses meta-learning for the surrogate model, stands out as the most promising approach. It outperforms PB2 and other baselines in both anytime and final performance across two RL environment families.

URL: https://openreview.net/forum?id=d9htascfP8

---

Title: Posterior Sampling for Reinforcement Learning on Graphs

Authors: Arnaud Robert, Aldo A. Faisal, Ciara Pike-Burke

Abstract: Many Markov Decision Processes (MDPs) exhibit structure in their state and action spaces that is not exploited. We consider the case where the structure can be modelled using a directed acyclic graph (DAG) composed of nodes and edges. In this case, each node has a state, and the state transition dynamics are influenced by the states and actions at its parent nodes.
We propose an MDP framework, \emph{Directed Acyclic Markov Decision Process} (DAMDP) that formalises this problem, and we develop algorithms to perform planning and learning.
Crucially, DAMDPs retain many of the benefits of MDPs, as we can show that Dynamic Programming can find the optimal policy in known DAMDPs. We also demonstrate how to perform Reinforcement Learning in DAMDPs when the transition probabilities and the reward function are unknown. To this end, we derive a posterior sampling-based algorithm that is able to leverage the graph structure to boost learning efficiency. Moreover, we obtain a theoretical bound on the Bayesian regret for this algorithm, which directly shows the efficiency gain from considering the graph structure. We then conclude by empirically demonstrating that by harnessing the DAMDP, our algorithm outperforms traditional posterior sampling for Reinforcement Learning in both a maximum flow problem and a real-world wind farm optimisation task.

URL: https://openreview.net/forum?id=kd6CfmdPfX

---

Title: Efficient Multi-Agent Cooperation Learning through Teammate Lookahead

Authors: Feng Chen, Xinwei Chen, Rong-Jun Qin, Cong Guan, Lei Yuan, Zongzhang Zhang, Yang Yu

Abstract: Cooperative Multi-Agent Reinforcement Learning (MARL) is a rapidly growing research field that has achieved outstanding results across a variety of challenging cooperation tasks. However, existing MARL algorithms typically overlook the concurrent updates of teammate agents. An agent always learns from the data that it cooperates with one set of (current) teammates, but then practices with another set of (updated) teammates. This phenomenon, termed as ``teammate delay'', leads to a discrepancy between the agent's learning objective and the actual evaluation scenario, which can degrade learning stability and efficiency.
In this paper, we tackle this challenge by introducing a lookahead strategy that enables agents to learn to cooperate with predicted future teammates, allowing the explicit awareness of concurrent teammate updates. This lookahead strategy is designed to seamlessly integrate with existing policy-gradient-based MARL methods, enhancing their performance without significant modifications to their underlying structures. The extensive experiments demonstrate the effectiveness of this approach, showing that the lookahead strategy can enhance the cooperation learning efficiency and achieve superior performance over the state-of-the-art MARL algorithms.

URL: https://openreview.net/forum?id=CeNNIQ8GJf

---

Title: A limitation on black-box dynamics approaches to Reinforcement Learning

Authors: Brieuc Pinon, Raphael Jungers, Jean-Charles Delvenne

Abstract: We prove a fundamental limitation on the computational efficiency of a large class of Reinforcement Learning (RL) methods. This limitation applies to model-free RL methods as well as some model-based methods, such as AlphaZero. We provide a formalism that describes this class and present a family of RL problems provably intractable for these methods. Conversely, the problems in the family can be efficiently solved by toy methods. We identify several types of algorithms proposed in the literature that can avoid our limitation, including algorithms that construct an inverse dynamics model, and planning algorithms that leverage an explicit model of the dynamics.

URL: https://openreview.net/forum?id=wPHVijYksq

---

Title: Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors

Authors: Samuel Stevens, Emily Wenger, Cathy Yuanchen Li, Niklas Nolte, Eshika Saxena, Francois Charton, Kristin E. Lauter

Abstract: Learning with Errors (LWE) is a hard math problem underlying recently standardized post-quantum cryptography (PQC) systems for key exchange and digital signatures. Prior work proposed new machine learning (ML)-based attacks on LWE problems with small, sparse secrets, but these attacks require millions of LWE samples to train on and take days to recover secrets. We propose three key methods---better preprocessing, angular embeddings and model pre-training---to improve these attacks, speeding up preprocessing by $25\times$ and improving model sample efficiency by $10\times$. We demonstrate for the first time that pre-training improves and reduces the cost of ML attacks on LWE. Our architecture improvements enable scaling to larger-dimension LWE problems: this work is the first instance of ML attacks recovering sparse binary secrets in dimension $n=1024$, the smallest dimension used in practice for homomorphic encryption applications of LWE where sparse binary secrets are proposed, albeit for larger modulus $q$. Our ML-based approach is the only attack which has successfully recovered secrets for these parameters.

URL: https://openreview.net/forum?id=w4nd5695sq

---

Title: Memory-Modular Classification: Learning to Generalize with Memory Replacement

Authors: Dahyun Kang, Ahmet Iscen, Eunchan Jo, Sua Choi, Minsu Cho, Cordelia Schmid

Abstract: We propose a novel memory-modular learner for image classification that separates knowledge memorization from reasoning. Our model enables effective generalization to new classes by simply replacing the memory contents, without the need for model retraining. Unlike traditional models that encode both world knowledge and task-specific skills into their weights during training, our model stores knowledge in the external memory of web-crawled image and text data. At inference time, the model dynamically selects relevant content from the memory based on the input image, allowing it to adapt to arbitrary classes by simply replacing the memory contents. The key differentiator that our learner meta-learns to perform classification tasks with noisy web data from unseen classes, resulting in robust performance across various classification scenarios. Experimental results demonstrate the promising performance and versatility of our approach in handling diverse classification tasks, including zero-shot/few-shot classification of unseen classes, fine-grained classification, and class-incremental classification.

URL: https://openreview.net/forum?id=DcIW0idrg8

---

Title: Efficient Training of Multi-task Neural Solver for Combinatorial Optimization

Authors: Chenguang Wang, Zhang-Hua Fu, Pinyan Lu, Tianshu Yu

Abstract: Efficiently training a multi-task neural solver for various combinatorial optimization problems (COPs) has been less studied so far. Naive application of conventional multi-task learning approaches often falls short in delivering a high-quality, unified neural solver. This deficiency primarily stems from the significant computational demands and a lack of adequate consideration for the complexities inherent in COPs. In this paper, we propose a general and efficient training paradigm to deliver a unified combinarotial multi-task neural solver. To this end, we resort to the theoretical loss decomposition for multiple tasks under an encoder-decoder framework, which enables more efficient training via proper bandit task-sampling algorithms through an intra-task influence matrix.
By employing theoretically grounded approximations, our method significantly enhances overall performance, regardless of whether it is within constrained training budgets, across equivalent training epochs, or in terms of generalization capabilities, when compared to conventional training schedules.
On the real-world datasets of TSPLib and CVRPLib, our method also achieved the best results compared to single task learning and multi-task learning approaches.
Additionally, the influence matrix provides empirical evidence supporting common practices in the field of learning to optimize, further substantiating the effectiveness of our approach.
Our code is open-sourced and available at \url{https://github.com/LOGO-CUHKSZ/MTL-COP}.

URL: https://openreview.net/forum?id=HJbcwRbMQQ

---

Title: Uncovering Strong Lottery Tickets in Graph Transformers: A Path to Memory Efficient and Robust Graph Learning

Authors: Hiroaki Ito, Jiale Yan, Hikari Otsuka, Kazushi Kawamura, Masato Motomura, Thiem Van Chu, Daichi Fujiki

Abstract: Graph Transformers (GTs) have recently demonstrated strong capabilities for capturing complex relationships in graph-structured data using global self-attention mechanisms. On the other hand, their high memory requirements during inference remain a challenge for practical deployment. In this study, we investigate the existence of strong lottery tickets (SLTs) — subnetworks within randomly initialized neural networks that can attain competitive accuracy without weight training — in GTs. Previous studies have explored SLTs in message-passing neural networks (MPNNs), showing that SLTs not only exist in MPNNs but also help mitigate over-smoothing problems and improve robustness. However, the potential of SLTs in GTs remains unexplored. With GTs having 4.5$\times$ more parameters than MPNNs, SLTs hold even greater application value in this context. We find that fixed random weights with a traditional SLT search method cannot adapt to imbalances of features in GTs, leading to highly biased attention that destabilizes model performance. To overcome this issue and efficiently search for SLTs, we introduce a novel approach called Adaptive Scaling. We empirically confirm the existence of SLTs within GTs and demonstrate their versatility through extensive experiments across different GT architectures, including NodeFormer, GRIT, and GraphGPS. Our findings demonstrate that SLTs achieve comparable accuracy while reducing memory usage by 2--32$\times$, effectively generalize to out-of-distribution data, and enhance robustness against adversarial perturbations. This work highlights that SLTs offer a resource-efficient approach to improving the scalability, efficiency, and robustness of GTs, with broad implications for applications involving graph data.

URL: https://openreview.net/forum?id=B1q9po4LPl

---

Title: FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting

Authors: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohan Sai Singamsetti, Fengyu Sun, Wei Lu, Di Niu

Abstract: Text-to-image (T2I) diffusion models have demonstrated impressive capabilities in generating high-quality images given a text prompt. However, ensuring the prompt-image alignment remains a considerable challenge, i.e., generating images that faithfully align with the prompt's semantics. Recent works attempt to improve the faithfulness by optimizing the latent code, which potentially could cause the latent code to go out-of-distribution and thus produce unrealistic images. In this paper, we propose FRAP, a simple, yet effective approach based on adaptively adjusting the per-token prompt weights to improve prompt-image alignment and authenticity of the generated images. We design an online algorithm to adaptively update each token's weight coefficient, which is achieved by minimizing a unified objective function that encourages object presence and the binding of object-modifier pairs. Through extensive evaluations, we show FRAP generates images with significantly higher prompt-image alignment to prompts from complex datasets, while having a lower average latency compared to recent latent code optimization methods, e.g., 4 seconds faster than D&B on the COCO-Subject dataset. Furthermore, through visual comparisons and evaluation of the CLIP-IQA-Real metric, we show that FRAP not only improves prompt-image alignment but also generates more authentic images with realistic appearances. We also explore combining FRAP with prompt rewriting LLM to recover their degraded prompt-image alignment, where we observe improvements in both prompt-image alignment and image quality. We release the code at the following link: https://github.com/LiyaoJiang1998/FRAP/.

URL: https://openreview.net/forum?id=MKCwO34oIq

---

Title: Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning

Authors: Mohammadamin Banayeeanzade, Mahdi Soltanolkotabi, Mohammad Rostami

Abstract: Multi-task learning (MTL) is a machine learning paradigm that aims to improve the generalization performance of a model on multiple related tasks by training it simultaneously on those tasks. Unlike MTL, where the model has instant access to the training data of all tasks, continual learning (CL) involves adapting to new sequentially arriving tasks over time without forgetting the previously acquired knowledge. Despite the wide practical adoption of CL and MTL and extensive literature on both areas, there remains a gap in the theoretical understanding of these methods when used with overparameterized models such as deep neural networks. This paper studies the overparameterized linear models as a proxy for more complex models. We develop theoretical results describing the effect of various system parameters on the model's performance in an MTL setup. Specifically, we study the impact of model size, dataset size, and task similarity on the generalization error and knowledge transfer. Additionally, we present theoretical results to characterize the performance of replay-based CL models. Our results reveal the impact of buffer size and model capacity on the forgetting rate in a CL setup and help shed light on some of the state-of-the-art CL methods. Finally, through extensive empirical evaluations, we demonstrate that our theoretical findings are also applicable to deep neural networks, offering valuable guidance for designing MTL and CL models in practice.

URL: https://openreview.net/forum?id=4zGPT0ZwnU

---

New submissions
===============

Title: Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach

Abstract: Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed, it may have an unfair effect in multi-class classification. While data augmentation generally improves the overall performance (and therefore is beneficial for many classes), it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose CLAM, a CLAss-dependent Multiplicative-weights method. To derive it, we first formulate the training of a classifier as a non-linear optimization problem that aims at simultaneously maximizing the individual class performances and balancing them. By rewriting this optimization problem as an adversarial two-player game, we propose a novel multiplicative weight algorithm, for which we prove the convergence. Interestingly, our formulation also reveals that the class-dependent effects of data augmentation is not due to data augmentation only, but is in fact a general phenomenon. Our empirical results over five datasets demonstrate that the performance of learned classifiers is indeed more fairly distributed over classes, with only limited impact on the average accuracy.

URL: https://openreview.net/forum?id=zNsfgCns7x

---

Title: DiffNat : Exploiting the Kurtosis Concentration Property for Image quality improvement

Abstract: Diffusion models have significantly advanced generative AI in terms of creating and editing naturalistic images. However, improving the image quality of generated images is still of paramount interest. In this context, we propose a generic kurtosis concentration (KC) loss that can be readily applied to any standard diffusion model pipeline to improve image quality. Our motivation stems from the projected kurtosis concentration property of natural images, which states that natural images have nearly constant kurtosis values across different band-pass filtered versions of the image. To improve the image quality of generated images, we reduce the gap between the highest and lowest kurtosis values across the band-pass filtered versions (e.g., Discrete Wavelet Transform (DWT)) of images. In addition, we also propose a novel condition-agnostic perceptual guidance strategy during inference to further improve the quality. We validate the proposed approach for four diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, (3) image super-resolution, and (4) blind face-restoration. Integrating the proposed KC loss and perceptual guidance has improved the perceptual quality in all these tasks in terms of FID, MUSIQ score, and user evaluation. Code is provided in the supplementary.

URL: https://openreview.net/forum?id=HdZQ7pMPRd

---

Title: A Survey on Self-play Methods in Reinforcement Learning

Abstract: Self-play, characterized by agents' interactions with copies or past versions of themselves, has recently gained prominence in reinforcement learning (RL). This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then, it provides a unified framework and classifies existing self-play algorithms within this framework. Moreover, the paper bridges the gap between the algorithms and their practical implications by illustrating the role of self-play in different scenarios. Finally, the survey highlights open challenges and future research directions in self-play. This paper is an essential guide map for understanding the multifaceted landscape of self-play in RL.

URL: https://openreview.net/forum?id=1MxsAg7KGk

---

Title: SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis

Abstract: Persistent dynamic scene modeling for tracking and novel-view synthesis remains challenging, particularly due to the complexity of capturing accurate deformations while maintaining computational efficiency. In this paper, we present SCas4D, a novel cascaded optimization framework that leverages inherent structural patterns in 3D Gaussian Splatting (3DGS) for dynamic scenes. Our key insight is that real-world deformations often exhibit hierarchical patterns, where groups of Gaussians undergo similar transformations. By employing a structural cascaded optimization approach that progressively refines deformations from coarse part-level to fine point-level adjustments, SCas4D achieves convergence within 100 iterations per time frame while maintaining competitive quality to the state-of-the-art method with only 1/20th of the training iterations. We further demonstrate our method's effectiveness in self-supervised articulated object segmentation, establishing a natural capability from our representation. Extensive experiments demonstrate our method's effectiveness in novel view synthesis and dense point tracking tasks.

URL: https://openreview.net/forum?id=YkycjbKjYP

---

Title: How does overparametrization affect performance on minority groups?

Abstract: The benefits of overparameterization for the overall performance of modern machine learning (ML) models are well known. However, the effect of overparameterization at a more granular level of data subgroups is less understood. Recent empirical studies demonstrate encouraging results: (i) when groups are not known, overparameterized models trained with empirical risk minimization (ERM) perform better on minority groups; (ii) when groups are known, ERM on data subsampled to equalize group sizes yields state-of-the-art worst-group accuracy in the overparameterized regime. In this paper, we complement these empirical studies with a theoretical investigation of the risk of overparameterized random feature models on minority groups. In a setting in which the regression functions for the majority and minority groups are different, we show that overparameterization always improves minority group performance.

URL: https://openreview.net/forum?id=POunezXgvF

---

Title: GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

Abstract: Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be obtained efficiently by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to handle by existing methods. The common color drifting issue that occurs in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality in extensive experiments demonstrates the effectiveness of our method. As shown in our evaluation, GaussianFlow can drastically improve both quantitative and qualitative results for 4D generation and 4D novel view synthesis.

URL: https://openreview.net/forum?id=XBL7xi5rt0

---

Title: Unlocking Visual Secrets: Inverting Features with Diffusion Priors for Image Reconstruction

Abstract: Inverting visual representations within deep neural networks (DNNs) presents a challenging and important problem in the field of security and privacy for deep learning.
The main goal is to invert the features of an unidentified target image generated by a pre-trained DNN, aiming to reconstruct the original image. Feature inversion holds particular significance in understanding the privacy leakage inherent in contemporary split DNN execution techniques, as well as in various applications based on the extracted DNN features.

In this paper, we explore the use of diffusion models, a promising technique for image synthesis, to enhance feature inversion quality. We also investigate the potential of incorporating alternative forms of prior knowledge, such as textual prompts and cross-frame temporal correlations, to further improve the quality of inverted features. Our findings reveal that diffusion models can effectively leverage hidden information from the DNN features, resulting in superior reconstruction performance compared to previous methods.
This research offers valuable insights into how diffusion models can enhance privacy and security within applications that are reliant on DNN features.

URL: https://openreview.net/forum?id=j6MgbuBiGV

---

Title: A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

Abstract: Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, which define the stage at which reasoning is achieved (either at inference time or through dedicated training); and (2) Architectures, which determine the components involved in the reasoning process, distinguishing between standalone LLMs and agentic compound systems that incorporate external tools, and multiagent collaborations. Within each dimension, we analyze two key perspectives: (1) Input level, which focuses on techniques that construct high-quality prompts that the LLM condition on; and (2) Output level, which methods that refine multiple sampled candidates to enhance reasoning quality. This categorization provides a systematic understanding of the evolving landscape of LLM reasoning, highlighting emerging trends such as the shift from inference-scaling to learning-to-reason (e.g., DeepSeek-R1), and the transition to agentic workflows (e.g., OpenAI Deep Research, Manus Agent). Additionally, we cover a broad spectrum of learning algorithms, from supervised fine-tuning to reinforcement learning such as PPO and GRPO, and the training of reasoners and verifiers. We also examine key designs of agentic workflows, from established patterns like generator-evaluator and LLM debate to recent innovations. Finally, we identify emerging trends, such as domain-specific reasoning systems, and open challenges, such as evaluation and data quality. This survey aims to provide AI researchers and practitioners with a comprehensive foundation for advancing reasoning in LLMs, paving the way for more sophisticated and reliable AI systems.

URL: https://openreview.net/forum?id=SlsZZ25InC

---

Title: The broader spectrum of in-context learning

Abstract: The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that *any* distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be interpreted as eliciting a kind of in-context learning. We suggest that this perspective helps to unify the broad set of in-context abilities that language models exhibit---such as adapting to tasks from instructions or role play, or extrapolating time series. This perspective also sheds light on potential roots of in-context learning in lower-level processing of linguistic dependencies (e.g. coreference or parallel structures). Finally, taking this perspective highlights the importance of generalization, which we suggest can be studied along several dimensions: not only the ability to learn something novel, but also flexibility in learning from different presentations, and in applying what is learned. We discuss broader connections to past literature in meta-learning and goal-conditioned agents, and other perspectives on learning and adaptation. We close by suggesting that research on in-context learning should consider this broader spectrum of in-context capabilities and types of generalization.

URL: https://openreview.net/forum?id=RHo3VVi0i5

---

Title: Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations

Abstract: Vision-language contrastive learning frameworks like CLIP enable learning representations from natural language supervision, and provide strong zero-shot classification capabilities. However, due to the nature of the supervisory signal in these paradigms, they lack the ability to learn localized features, leading to degraded performance on dense prediction tasks like segmentation and detection. On the other hand, self-supervised learning methods have shown the ability to learn granular representations, complementing the high-level features in vision-language training. In this work, we present Harmony, a framework that combines vision-language training with discriminative and generative self-supervision to learn visual features that can be generalized across different vision downstream tasks. Our framework is specifically designed to work on web-scraped data by not relying on negative examples and addressing the one-to-one correspondence issue using soft CLIP targets generated by an EMA model. We comprehensively evaluate Harmony across various vision downstream tasks and find that it significantly outperforms the baseline CLIP and the previously leading joint self and weakly-supervised methods, MaskCLIP and SLIP. Specifically, when comparing against these methods, Harmony shows superior performance in fine-tuning and zero-shot classification on ImageNet-1k, semantic segmentation on ADE20K, and both object detection and instance segmentation on MS-COCO, when pre-training a ViT-B on CC3M. We also show that Harmony outperforms other self-supervised learning methods like iBOT and MAE across all tasks evaluated.

URL: https://openreview.net/forum?id=IcOBCufqFO

---

Title: Variance Reduction of Stochastic Hypergradient Estimation by Mixed Fixed-Point Iteration

Abstract: Hypergradient represents how the hyperparameter of an optimization problem (or inner-problem) changes an outer-cost through the optimized inner-parameter, and it takes a crucial role in hyperparameter optimization, meta learning, and data influence estimation.
This paper studies hypergradient computation involving a stochastic inner-problem, a typical machine learning setting where the empirical loss is estimated by minibatches.
Stochastic hypergradient estimation requires estimating products of Jacobian matrices of the inner iteration.
Current methods struggle with large estimation variance because they depend on a specific sequence of Jacobian samples to estimate this product.
This paper overcomes this problem by \emph{mixing} two different stochastic hypergradient estimation methods that use distinct sequences of Jacobian samples.
Furthermore, we show that the proposed method enables almost sure convergence to the true hypergradient through the stochastic Krasnosel'ski\u{\i}-Mann iteration.
Theoretical analysis demonstrates that, compared to existing approaches, our method achieves lower asymptotic variance bounds while maintaining comparable computational complexity.
Empirical evaluations on synthetic and real-world tasks verify our theoretical results and superior variance reduction over existing methods.

URL: https://openreview.net/forum?id=mkmX2ICi5c

---

Title: FEC-Real: A KAN based Model for Improving Investment Strategies

Abstract: The application of mathematical and computational techniques in financial investment has emerged as a prominent area of research, leading to the development of various tasks including factor mining, stock prediction, and analysis of financial statements. In this work we particularly focus on the task of predicting the future trend for stocks. In existing fintech research different transformer-based models have been explored for predicting future stock trend. This study is motivated by the need for a more efficient network architecture that can enhance the interpretation of real-time data. However, transformer based models are not always efficient for real world high speed trading data. To address this, we particularly explore the effectiveness of Kolmogorov Arnold Network (KAN) for financial time series model. We propose a KAN based encoder (FTS2K) which utilizes both KAN and transformer architecture to predict future stock price movements. Empirical results show that our proposed Encoder improves yields an average accuracy enhancement of $2.62\%$ across state-of-the-art (SOTA) time series models. Our approach consistently outperforms in four datasets (i.e. China A Daily, China A Min, China Futures Min, Dow 12 Daily), achieving superior results in both ACC and Top-100 ACC metrics.

URL: https://openreview.net/forum?id=KgqI1PlmxR

---

Title: Deep Autoregressive Models as Causal Inference Engines

Abstract: Existing causal inference (CI) models are often restricted to handling low-dimensional confounders and singleton actions. We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions commonly found in modern applications. Our approach accomplishes this using sequencification, which transforms data from an underlying causal diagram into a sequence of tokens. Sequencification not only accommodates training with data generated from a large class of DAGs, but also extends existing CI capabilities to estimate multiple causal quantities using a single model. We can directly compute probabilities from interventional distributions, simplifying inference and enhancing outcome prediction accuracy. We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates.

URL: https://openreview.net/forum?id=uuREHPf2ll

---

Title: Document Classification using Reference Information

Abstract: Document classification is a common problem when organizing unstructured text data, but supervised classification algorithms often require many labeled examples and tedious manual annotation by humans labelers. This work propose an innovative methodology called Document Classification using Reference Information (DCRI), which classifies documents with little human intervention by leveraging the existence of reference information, documents from external sources related to the label classes of interest. For example, when classifying news articles into topics, Wikipedia articles can serve as such an external source. DCRI uses reference information to generate weak initial labels for an unlabeled corpus, then iteratively augments them into stronger labels using both supervised machine learning algorithms and limited human labeling capacity, if available. DCRI is evaluated on one dataset from a major pharmaceutical manufacturing company and two public datasets for news topic classification. When no human labeling capacity is available, DCRI achieves an accuracy between 84% to 96% on these three datasets. When some manual labeling capacity is available, DCRI helps prioritizing labeling documents with high uncertainty. To shed light on the value of reference information, this paper also develops a generative mathematical model in which reference information provides a noisy estimate of the latent distribution that generates documents. An extensive numerical study is performed using synthetic data to analyze when and why reference information is most valuable. Finally, for a special case of the model with two classes, a theoretical result is established to show the value of the iterative nature of the DCRI approach.

URL: https://openreview.net/forum?id=Ju2Frrog7n

---

Title: TabText: A Flexible and Contextual Approach to Tabular Data Representation

Abstract: Tabular data is an essential data format for applying machine learning tasks across various industries. However, traditional data processing methods do not fully utilize all the information available in the tables, ignoring important contextual information such as column header descriptions. In addition, pre-processing data into a tabular format can remain a labor-intensive bottleneck in model development. This work introduces TabText, a processing and feature extraction framework that extracts contextual information from tabular data structures. TabText addresses processing difficulties by converting the content into language and utilizing pre-trained large language models (LLMs). We evaluate our framework on ten healthcare prediction tasks—including patient discharge, ICU admission, and mortality—and validate its generalizability on an additional task from a different domain. We show that 1) applying our TabText framework enables the generation of high-performing and simple machine learning baseline models with minimal data pre-processing, and 2) augmenting pre-processed tabular data with TabText representations improves the average and worst-case AUC performance of standard machine learning models by as much as 5% additive points. All the code to reproduce the results can be found at https://anonymous.4open.science/r/TabText-18F0.

URL: https://openreview.net/forum?id=PuQH7vdMCB

---

Title: A Survey on Future Frame Synthesis: Bridging Deterministic and Generative Approaches

Abstract: Future Frame Synthesis (FFS) focuses on generating future frame sequences conditioned on existing content. This survey provides a comprehensive review of existing research on FFS, covering commonly used datasets and representative algorithms. We discuss key challenges and trace the evolution of FFS in computer vision, particularly the shift from deterministic to generative approaches. Our taxonomy outlines major advances and methodological shifts, emphasizing the rising significance of generative models in producing realistic and diverse predictions.

URL: https://openreview.net/forum?id=ZN4rzrHlNz

---

Title: Leveraging AutoML for Sustainable Deep Learning: A Multi- Objective HPO Approach on Deep Shift Neural Networks

Abstract: Deep Learning (DL) has advanced various fields by extracting complex patterns from large
datasets. However, the computational demands of DL models pose environmental and
resource challenges. Deep Shift Neural Networks (DSNNs) present a solution by leveraging
shift operations to reduce computational complexity at inference. Compared to common
DNNs, DSNNs are still less well understood and less well optimized. By leveraging AutoML
techniques, we provide valuable insights into the potential of DSNNs and how to design them
in a better way. Since we consider complementary objectives such as accuracy and energy
consumption, we combine state-of-the-art multi-fidelity (MF) HPO with multi-objective
optimization to find a set of Pareto optimal trade-offs on how to design DSNNs. Our
approach led to significantly better configurations of DSNNs regarding loss and emissions
compared to default DSNNs. This includes simultaneously increasing performance by about
20% and reducing emissions, in some cases by more than 60%. Investigating the behavior
of quantized networks in terms of both emissions and accuracy, our experiments reveal
surprising model-specific trade-offs, yielding the greatest energy savings. For example, in
contrast to common expectations, selectively quantizing smaller portions of the network
with low precision is optimal while retaining or improving performance. We corroborated
these findings across multiple backbone architectures, highlighting important nuances in
quantization strategies and offering an automated approach to balancing energy efficiency
and model performance.

URL: https://openreview.net/forum?id=vk7b11DHcW

---

Title: Evaluating explainability techniques on discrete-time graph neural networks

Abstract: Discrete-time temporal Graph Neural Networks (GNNs) are powerful tools for modeling evolving graph-structured data and are widely used in decision-making processes across domains such as social network analysis, financial systems, and collaboration networks. Explaining the predictions of these models is an important research area due to the critical role their decisions play in building trust in social or financial systems. However, the explainability of Temporal Graph Neural Networks remains a challenging and relatively unexplored field. Hence, in this work, we propose a novel framework to evaluate explainability techniques tailored for discrete-time temporal GNNs. Our framework introduces new training and evaluation settings that capture the evolving nature of temporal data, defines metrics to assess the temporal aspects of explanations, and establishes baselines and models specific to discrete-time temporal networks. Through extensive experiments, we outline the best explainability techniques for discrete-time GNNs in terms of fidelity, efficiency, and human-readability trade-offs. By addressing the unique challenges of temporal graph data, our framework sets the stage for future advancements in explaining discrete-time GNNs.

URL: https://openreview.net/forum?id=JzmXo0rfry

---

Title: Disentangled Embedding through Style and Mutual Information for Domain Generalization

Abstract: Deep neural networks often experience performance degradation when faced with distributional shifts between training and testing data, a challenge referred to as domain shift. Domain Generalization (DG) addresses this issue by training models on multiple source domains, enabling the development of invariant representations that generalize to unseen distributions. While existing DG methods have achieved success by minimizing variations across source domains within a shared feature space, recent advances inspired by representation disentanglement have demonstrated improved performance by separating latent features into domain-specific and domain-invariant components. We propose two novel frameworks: Disentangled Embedding through Mutual Information (DETMI) and Disentangled Embedding through Style Information (DETSI). DETMI enforces disentanglement by employing a mutual information estimator, minimizing the mutual dependence between domain-agnostic and domain-specific embeddings. DETSI, on the other hand, achieves disentanglement through style extraction and perturbation, facilitating the learning of domain-invariant and domain-specific representations. Extensive experiments on the PACS, Office-Home, and VLCS datasets show that both frameworks outperform several state-of-the-art DG techniques.

URL: https://openreview.net/forum?id=552tedTByb

---

Title: Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Abstract: Mixture of Experts (MoE) LLMs enhance performance by selectively activating specialized subnetworks ("experts") per input. While MoEs offer efficiency benefits through distributed inference in typical high-throughput settings, deploying them on memory-constrained devices remains challenging, particularly for sequential token generation with batch size one. In this work, we optimize MoE for such constrained environments, where only a subset of expert weights fit into DRAM.
Through empirical analysis, we show MoEs can tolerate careful deviations in expert selection with minimal predictive performance loss. Inspired by this observation, we propose a novel cache-aware routing strategy that leverages expert reuse during token generation to significantly improve cache locality.
Evaluating on language modeling, MMLU, and GSM8K benchmarks, our method reduces cache miss rates by over 50%, with negligible impact on perplexity (0.1%–3%) and downstream task accuracy (<0.1%). Unlike prior methods limited by the optimal oracle cache bound, our approach surpasses this theoretical limit by allowing slight flexibility in expert selection. Finally, we present on-device results demonstrating 2$\times$ speedups on mobile hardware, offering a flexible and training-free solution to extend MoE's applicability across real-world applications.

URL: https://openreview.net/forum?id=ul4W26KEKz

---

Title: \emph{Let's Roll a BiFTA}: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models

Abstract: Recent research has shown that aligning fine-grained text descriptions with localized image patches can significantly improve the zero-shot performance of pre-trained vision-language models (e.g., CLIP).
However, we find that both fine-grained text descriptions and localized image patches often contain redundant information, making text-visual alignment less effective.
In this paper, we tackle this issue from two perspectives: \emph{view refinement} and \emph{description refinement}, termed as \textit{\textbf{Bi}-refinement for \textbf{F}ine-grained \textbf{T}ext-visual \textbf{A}lignment} (BiFTA).
\emph{View refinement} removes redundant image patches with high \emph{Intersection over Union} (IoU) ratios, resulting in more distinctive visual samples.
\emph{Description refinement} removes redundant text descriptions with high pairwise cosine similarity, ensuring greater diversity in the remaining descriptions.
BiFTA achieves superior zero-shot performance on 6 benchmark datasets for both ViT-based and ResNet-based CLIP, justifying the necessity to remove redundant information in visual-text alignment.
Our code is available at: \url{https://anonymous.4open.science/r/BiFTA-A707}.

URL: https://openreview.net/forum?id=qG8vstoyyr

---

Title: Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation

Abstract: Large Language Models (LLMs) are highly resource-intensive to fine-tune due to their enormous size. While low-rank adaptation is a prominent parameter-efficient fine-tuning approach, it suffers from sensitivity to hyperparameter choices, leading to instability in model performance on fine-tuning downstream tasks. This paper highlights the importance of effective parameterization in low-rank fine-tuning to reduce estimator variance and enhance the stability of final model outputs. We propose MonteCLoRA, an efficient fine-tuning technique that employs Monte Carlo estimation to learn an unbiased posterior estimation of low-rank parameters with low expected variance, stabilizing fine-tuned LLMs with only $\mathcal{O}(r)$ additional parameters, for a given rank $r$. MonteCLoRA shows significant improvements in accuracy and robustness, achieving up to $3.8$% higher accuracy and $8.6$% greater robustness than existing efficient fine-tuning methods on natural language understanding tasks with pre-trained RoBERTa-base. Furthermore, in generative tasks with pre-trained LLaMA-1-7B, MonteCLoRA demonstrates robust zero-shot performance with $50$% lower variance than the contemporary efficient fine-tuning methods. The theoretical and empirical results presented in the paper underscore how parameterization and hyperpriors balance exploration-exploitation in the low-rank parametric space, therefore leading to more optimal and robust parameter estimation during efficient fine-tuning.

URL: https://openreview.net/forum?id=2HFmicB8kh

---

Title: Scaling and Distilling Transformer Models for sEMG

Abstract: Surface electromyography (sEMG) signals offer a promising avenue for developing innovative human-computer interfaces by providing insights into muscular activity. However, limited available training data and computational constraints during deployment have restricted the use of state-of-the-art machine learning models, such as transformers, in challenging sEMG tasks. In this paper, we demonstrate that transformer models can learn effective and generalizable representations from sEMG datasets that are small by modern deep learning standards (approximately 100 users), surpassing the performance of classical machine learning methods and older neural network architectures. Additionally, by leveraging model distillation techniques, we reduce parameter counts by up to 50x with minimal loss of performance. This results in efficient and expressive models suitable for complex real-time sEMG tasks in dynamic real-world environments.

URL: https://openreview.net/forum?id=hFPWThwUiZ

---

Title: FairLoRA: Unpacking Bias Mitigation in Vision Models with Fairness-Driven Low-Rank Adaptation

Abstract: Recent advances in parameter-efficient fine-tuning methods, such as Low Rank Adaptation (LoRA), have gained significant attention for their ability to efficiently adapt large foundational models to various downstream tasks. These methods are appreciated for achieving performance comparable to full fine-tuning on aggregate-level metrics, while significantly reducing computational costs. To systematically address fairness in LLMs previous studies fine-tune on fairness specific data using a larger LoRA rank than typically used. In this paper, we introduce FairLoRA, a novel fairness-specific regularizer for LoRA aimed at reducing performance disparities across data subgroups by minimizing per-class variance in loss. To the best of our knowledge, we are the first to introduce a fairness based finetuning through LoRA. Our results demonstrate that the need for higher ranks to mitigate bias is not universal; it depends on factors such as the pre-trained model, dataset, and task. More importantly, we systematically evaluate FairLoRA across various vision models, including ViT, DiNO, and CLIP, in scenarios involving distribution shifts. We further emphasize the necessity of using multiple fairness metrics to obtain a holistic assessment of fairness, rather than relying solely on the metric optimized during training.

URL: https://openreview.net/forum?id=D09oy2o8ay

---

Title: Activate and Adapt: A Two-Stage Framework for Open-Set Model Adaptation

Abstract: The ability of generalizing to new environments is critical for deep neural networks. Most existing works presume that the training and test data share an identical label set, overlooking the potential presence of new classes in test data. In this paper, we tackle a practical and challenging problem: Open-Set Model Adaptation (OSMA). OSMA aims to train a model on the source domain, which contains only known class data, and then adapt the trained model to the distribution-shifted target domain to classify known class data while identifying new class data. In this context, we face two challenges: (1) enabling the model to recognize new classes using only the known class data from the source domain during training, and (2) adapting the source-trained model to the target domain that contains new class data. To address these challenges, we propose a novel and universal two-stage framework named Activate and Adapt (ADA). In the training stage, we extract potential new class information hidden within the rich semantics of the source domain data to enable the model to identify new class data. Additionally, to retain source domain information while preserving data privacy, we condense the source domain data into a small dataset, facilitating the subsequent adaptation phase. In the test stage, we adaptively adjust the source-trained model to the target domain with new classes by infusing the style of target data into the condensed dataset, and decoupling domain alignment for known and new classes. Experiments across three standard benchmarks demonstrate that ADA surpasses previous methods in both online and offline settings.

URL: https://openreview.net/forum?id=2AWbwSpET9

---

Title: Unbiased Loss Functions for Multilabel Classification with Missing Labels

Abstract: This paper considers binary and multilabel classification problems in a
setting where labels are missing independently and with a known rate. Missing
labels are a ubiquitous phenomenon in extreme multi-label classification (XMC)
tasks, such as matching Wikipedia articles to a small subset out of the
hundreds of thousands of possible tags, where no human annotator can possibly
check the validity of all the negative samples. For this reason,
propensity-scored precision---an unbiased estimate for precision-at-k under a
known noise model---has become one of the standard metrics in XMC. Few
methods take this problem into account already during the training phase, and
all of these are limited to loss functions that can be decomposed into a sum
of contributions from each individual label. A typical approach to training is
to reduce the multilabel problem into a series of binary or multiclass
problems, and it has been shown that if the surrogate task should be
consistent for optimizing recall, the resulting loss function is not
decomposable over labels. Therefore, this paper develops unbiased estimators
for generic, potentially non-decomposable loss functions. These estimators
suffer from increased variance and may lead to ill-posed optimization
problems, which we address by switching to convex upper-bounds. The
theoretical considerations are further supplemented by an experimental study
showing that the switch to unbiased estimators significantly alters the
bias-variance trade-off and may thus require stronger regularization.

URL: https://openreview.net/forum?id=hMq1hUhLqp

---

Title: Explaining Confident Black-Box Predictions

Abstract: Interpretability is crucial for leveraging predictive machine learning for decision-making, but the strongest performing models are often black-boxes in that they are difficult to understand. For binary classification models, a growing body of literature seeks to find \textit{model-agnostic} explanations by treating a model as a list of 0/1 predictions and identifying patterns for when a model predicts $1$ over $0$ (or vice versa). While such explanations are
useful for understanding when a model predicts 1 over 0, they do
not consider the confidence (i.e., the probability) behind predictions, a critical piece of information provided by most classification models. Since the 0/1 predictions of a model depend on the choice of a subjective threshold for discretizing predicted probabilities, as one changes the threshold, the resulting explanations may change despite the underlying model staying the same. In contrast, this work proposes model-agnostic explanations that treat a black-box model as a \textit{ranking} across a dataset from lowest predicted probability of $1$ to highest, rather than a list of 0/1 predictions. Under this ranking, a useful explanation should capture broadly when a model \textit{confidently} predicts $1$ (i.e., highly ranked data points). Since highly confident predictions are often correlated with predictions that are more accurate and actionable, understanding when a model predicts confidently is often quite valuable to a practitioner.

This work builds explanations based on rule lists (i.e., a collection of if-then rules) as well as a novel special case called checklists. A strong rule list or checklist is satisfied by a large number of data points that are ranked highly by the model. This criteria is measured by the traditional metric of support (i.e., the number of data points an explanation applies to), the \textit{average} ranking of those data points, which we call the Average Black-Box Ranking (ABBR), as well as the sparsity of the explanation (e.g., number of rules in the rule list, among others). Given these metrics, this work develops a local-search based optimization methodology for finding explanations based on rule lists and checklists that maximize ABBR for a user-specified support and sparsity constraint. The methodology leverages a local search approach where an initial rule list is chosen greedily from a pool of candidate rules, then slowly perturbed by swapping rules from the rule list with those in the candidate pool. This approach is evaluated on 6 real world datasets in application areas ranging from healthcare to criminal justice and finance. Empirical results suggest that this methodology finds rule lists of length at most 5 with ABBR within 7.4\% of the optimal ABBR of any explanation, while checklists provide greater interpretability for a small cost in performance.

URL: https://openreview.net/forum?id=SAwZpgKJcc

---

Title: On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Abstract: Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We provide a fine-grained characterization of the error evolution, which decays to zero as the number of iterations $T$ increases. When $K(E-1)$ is below a certain threshold, similar to the homogeneous environment settings, there is a linear speed-up concerning $K$. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $\Theta_R (\frac{E}{(1-\gamma)T})$, where $\Theta_R$ only hides numerical constants and the specific choice of reward values. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes.

URL: https://openreview.net/forum?id=EkLAG3gt3g

---

Title: Graph Fourier Neural ODEs: Modeling Spatial-temporal Multi-scales in Molecular Dynamics

Abstract: Accurately predicting long-horizon molecular dynamics (MD) trajectories remains a significant challenge, as existing deep learning methods often struggle to retain fidelity over extended simulations. We hypothesize that one key factor limiting accuracy is the difficulty of capturing interactions that span distinct spatial and temporal scales-ranging from high-frequency local vibrations to low-frequency global conformational changes. To address these limitations, we propose Graph Fourier Neural ODEs (GF-NODE), integrating a graph Fourier transform for spatial frequency decomposition with a Neural ODE framework for continuous-time evolution. Specifically, GF-NODE first decomposes molecular configurations into multiple spatial frequency modes using the graph Laplacian, then evolves the frequency components in time via a learnable Neural ODE module that captures both local and global dynamics, and finally reconstructs the updated molecular geometry through an inverse graph Fourier transform. By explicitly modeling high- and low-frequency phenomena in this unified pipeline, GF-NODE more effectively captures long-range correlations and local fluctuations alike. Experimental results on challenging MD benchmarks, including MD17 and alanine dipeptide, demonstrate that GF-NODE achieves state-of-the-art accuracy while preserving essential geometrical features over extended simulations. These findings highlight the promise of bridging spectral decomposition with continuous-time modeling to improve the robustness and predictive power of MD simulations. Our implementation is publicly available at https://anonymous.4open.science/r/GF-NODE-code-B289/

URL: https://openreview.net/forum?id=XK7cIdj6Fz

---

Title: Augmented Invertible Koopman Autoencoder for long-term time series forecasting

Abstract: Following the introduction of Dynamic Mode Decomposition and its numerous extensions, many neural autoencoder-based implementations of the Koopman operator have recently been proposed. This class of methods appears to be of interest for modeling dynamical systems, either through direct long-term prediction of the evolution of the state or as a powerful embedding for downstream methods. In particular, a recent line of work has developed invertible Koopman autoencoders (IKAEs), which provide an exact reconstruction of the input state thanks to their analytically invertible encoder, based on coupling layer normalizing flow models. We identify that the conservation of the dimension imposed by the normalizing flows is a limitation for the IKAE models, and thus we propose to augment the latent state with a second, non-invertible encoder network. This results in our new model: the Augmented Invertible Koopman AutoEncoder (AIKAE). We demonstrate the relevance of the AIKAE through a series of long-term time series forecasting experiments, on satellite image time series as well as on a benchmark involving predictions based on a large lookback window of observations.

URL: https://openreview.net/forum?id=o6ukhJLzMQ

---

Title: Towards Generalizing Neural Topical Representations

Abstract: Topic models have evolved from conventional Bayesian probabilistic models to recent Neural Topic Models (NTMs). Although NTMs have shown promising performance when trained and tested on a specific corpus, their generalization ability across corpora has yet to be studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation (i.e., latent distribution over topics) for the document from different target corpora to a certain degree. In this work, we aim to improve NTMs further so that their representation power for documents generalizes reliably across corpora and tasks. To do so, we propose to enhance NTMs by narrowing the semantic distance between similar documents, with the underlying assumption that documents from different corpora may share similar semantics. Specifically, we obtain a similar document for each training document by text data augmentation. Then, we optimize NTMs further by minimizing the semantic distance between each pair, measured by the Topical Optimal Transport (TopicalOT) distance, which computes the optimal transport distance between their topical representations. Our framework can be readily applied to most NTMs as a plug-and-play module. Extensive experiments show that our framework significantly improves the generalization ability regarding neural topical representation across corpora.

URL: https://openreview.net/forum?id=o33gtLLyMP

---

Title: Understanding In-Context Learning from a Kernel Regression Perspective

Abstract: Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capability of LLMs is intriguing, and it is not yet fully understood how pretrained LLMs acquire such capabilities. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training on a general language corpus by proposing a kernel-regression perspective of understanding LLMs' ICL behaviors when faced with in-context examples. More concretely, we first prove that Bayesian inference on in-context prompts can be asymptotically understood as kernel regression $\hat y = \sum_i y_i K(x, x_i)/\sum_i K(x, x_i)$ as the number of in-context demonstrations grows. Then, we empirically investigate the in-context behaviors of language models. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression. Finally, our theory provides insights into multiple phenomena observed in the ICL field: why retrieving demonstrative samples similar to test samples can help, why ICL performance is sensitive to the output formats, and why ICL accuracy benefits from selecting in-distribution and representative samples.

URL: https://openreview.net/forum?id=6rD50Q6yYz

---

Title: FlowKac: An Efficient Neural Fokker-Planck solver using Temporal Normalizing flows and the Feynman Kac-Formula

Abstract: Solving the Fokker-Planck equation for high-dimensional complex dynamical systems remains a pivotal yet challenging task due to the intractability of analytical solutions and the limitations of traditional numerical methods. In this work, we present FlowKac, a novel approach that reformulates the Fokker-Planck equation using the Feynman-Kac formula, allowing to query the solution at a given point via the expected values of stochastic paths. A key innovation of FlowKac lies in its adaptive stochastic sampling scheme which significantly reduces the computational complexity while maintaining high accuracy. This sampling technique, coupled with a time-indexed normalizing flow, designed for capturing time-evolving probability densities, enables robust sampling of collocation points, resulting in a flexible and mesh-free solver. This formulation mitigates the curse of dimensionality and enhances computational efficiency and accuracy, which is particularly crucial for applications that inherently require dimensions beyond the conventional three. We validate the robustness and scalability of our method through various experiments on a range of stochastic differential equations, demonstrating significant improvements over existing techniques.

URL: https://openreview.net/forum?id=paeyQFa5or

---

Title: EGAIN: Enhanced Generative Adversarial Networks for Imputing Missing Values

Abstract: Missing values pose a challenge in predictive analysis specially in big data because most models depend on complete datasets to estimate functional relationships between variables. Generative Adversarial Imputation Networks are among the most reliable methods to impute missing values with plausible numbers from the dataset. This research introduces Enhanced Generative Adversarial Networks (EGAIN), which address the GAIN convergence issue, introduce new functionality to the GAIN process, and significantly improve its performance.

URL: https://openreview.net/forum?id=9lCHLhMOiZ

---

Title: Efficient Credit Assignment in Cooperative Multi-Agent Reinforcement Learning

Abstract: Cooperative multi-agent reinforcement learning (MARL) algorithms are crucial in addressing real-world challenges wherein multiple agents collaborate to achieve common objectives. The effectiveness of these algorithms hinges on the accurate estimation of agent action values, typically attained through learning joint and individual action values. However, challenges arise due to the credit assignment problem since it is difficult to accurately attribute the global reward to the actions of individual agents, which limits sample efficiency. This paper introduces ECA, an episodic control-based method, to mitigate this limitation by directly evaluating and assigning individual agent credits. ECA leverages episodic memory to store and cluster past interaction experiences between agents and the environment. Building upon these experiences, we introduce an intrinsic reward signal, quantifying the individual agent credits to the joint goal. This proposed reward signal serves as a corrective measure to revise individual action values, thereby improving the accuracy of individual and joint value estimations. We evaluate our methodology on StarCraft multi-agent challenge (SMAC) and Google Research Football (GRF) tasks, demonstrating that our method significantly improves the sample efficiency of state-of-the-art cooperative MARL algorithms.

URL: https://openreview.net/forum?id=ZOTpFhwQQ9

---

Title: Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference

Abstract: The Multi-Output Gaussian Process (MOGP) is a popular tool for modelling data from multiple sources. A typical choice to build a covariance function for a MOGP is the Linear Model of Coregionalisation (LMC) which parametrically models the covariance between outputs. The Latent Variable MOGP (LV-MOGP) generalises this idea by modelling the covariance between outputs using a kernel applied to latent variables, one per output, leading to a flexible MOGP model that allows efficient generalisation to new outputs with few data points. The computational complexity in LV-MOGP grows linearly with the number of outputs, which makes it unsuitable for problems with a large number of outputs. In this paper, we propose a stochastic variational inference approach for the LV-MOGP that allows mini-batches for both inputs and outputs, making computational complexity per training iteration independent of the number of outputs. We demonstrate the performance of the model by benchmarking against some other MOGP models in several real-world datasets, including spatial-temporal climate modelling and spatial transcriptomics.

URL: https://openreview.net/forum?id=kK0WrBZAli

---

Title: Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization

Abstract: Generative adversarial networks (GANs) learn a latent space whose samples can be mapped to real-world images. Such latent spaces are difficult to interpret. Some earlier supervised methods aim to create an interpretable latent space or discover interpretable directions that require exploiting data labels or annotated synthesized samples for training. However, we propose using a modification of vector quantization called space-filling vector quantization (SFVQ), which quantizes the data on a piece-wise linear curve. SFVQ can capture the underlying morphological structure of the latent space and thus make it interpretable. We apply this technique to model the latent space of pretrained StyleGAN2 and BigGAN networks on various datasets. Our experiments show that the SFVQ curve yields a general interpretable model of the latent space that determines which part of the latent space corresponds to what specific generative factors. Furthermore, we demonstrate that each line of SFVQ's curve can potentially refer to an interpretable direction for applying intelligible image transformations. We also showed that the points located on an SFVQ line can be used for controllable data augmentation.

URL: https://openreview.net/forum?id=SEJatSGZX8

---

Title: Provably Extending PageRank-based Local Clustering to Weighted Directed Graphs with Self-Loops and to Hypergraphs

Abstract: Local clustering aims to find a compact cluster near the given starting instances, which has broad applications beyond graphs because of the internal connectivities within various modalities. While most existing studies on local graph clustering adopt the discrete graph setting (i.e., unweighted graphs without self-loops), real-world graphs can be more complex. In this paper, we extend the non-approximating Andersen-Chung-Lang (``ACL") algorithm beyond discrete graphs and generalize its quadratic optimality to a wider range of graphs, including weighted, directed, and self-looped graphs and hypergraphs. Specifically, leveraging PageRank, we propose two algorithms: GeneralACL for graphs and HyperACL for hypergraphs. We theoretically prove that, under two mild conditions, both algorithms can identify a quadratically optimal local cluster in terms of conductance with at least $\frac{1}{2}$ probability. On the property of hypergraphs, we address a fundamental gap in the literature by defining conductance for hypergraphs from the perspective of hypergraph random walks. Additionally, we provide experiments to validate our theoretical findings.

URL: https://openreview.net/forum?id=nOzB2JSu2s

---

Title: Hypergraphs as Weighted Directed Self-Looped Graphs: Spectral Properties, Clustering, Cheeger Inequality

Abstract: Hypergraphs naturally arise when studying group relations and have been widely used in the field of machine learning.
To the best of our knowledge, the recently proposed edge-dependent vertex weights (EDVW) modeling is one of the most generalized modeling methods of hypergraphs, i.e., most existing hypergraph conceptual modeling methods can be generalized as EDVW hypergraphs without information loss.
However, the relevant algorithmic developments on EDVW hypergraphs remain nascent: compared to the spectral theories for graphs, its formulations are incomplete, the spectral clustering algorithms are not well-developed, and the hypergraph Cheeger Inequality is not well-defined.
To this end, deriving a unified random walk-based formulation, we propose our definitions of hypergraph Rayleigh Quotient, NCut, boundary/cut, volume, and conductance, which are consistent with the corresponding definitions on graphs.
Then, we prove that the normalized hypergraph Laplacian is associated with the NCut value, which inspires our proposed HyperClus-G algorithm for spectral clustering on EDVW hypergraphs.
Finally, we prove that HyperClus-G can always find an approximately linearly optimal partitioning in terms of both NCut and conductance.
Additionally, we provide extensive experiments to validate our theoretical findings from an empirical perspective.

URL: https://openreview.net/forum?id=xLWhuCXWiM

---

Title: Decoding Human Preferences in Alignment: An Improved Approach to Inverse Constitutional AI

Abstract: Traditional methods for aligning Large Language Models (LLMs), such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), rely on implicit principles, limiting interpretability. Constitutional AI (CAI) offers an explicit, rule-based framework for guiding LLM alignment. Building on this, we refine the Inverse Constitutional AI (ICAI) algorithm, which extracts constitutions from preference datasets. By improving principle generation, clustering, and embedding processes, our approach enhances the accuracy and generalizability of extracted principles across synthetic and real-world datasets. Our results highlight the potential of these principles to foster more transparent and adaptable alignment methods, offering a promising direction for future advancements beyond traditional fine-tuning.

URL: https://openreview.net/forum?id=jgj4BIlnE5

---

Title: Unifying Self-Supervised Clustering and Energy-Based Models

Abstract: Self-supervised learning excels at learning representations from large amounts of data. At the same time, generative models offer the complementary property of learning information about the underlying data generation process. In this study, we aim at establishing a principled connection between these two paradigms and highlight the benefits of their complementarity. In particular, we perform an analysis of self-supervised learning objectives, elucidating the underlying probabilistic graphical models and presenting a standardized methodology for their derivation from first principles. The analysis suggests a natural means of integrating self-supervised learning with likelihood-based generative models. We instantiate this concept within the realm of cluster-based self-supervised learning and energy models, introducing a lower bound proven to reliably penalize the most important failure modes. Our theoretical findings are substantiated through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, demonstrating that our objective function allows to jointly train a backbone network in a discriminative and generative fashion, consequently outperforming existing self-supervised learning strategies in terms of clustering, generation and out-of-distribution detection performance by a wide margin. We also demonstrate that the solution can be integrated into a neuro-symbolic framework to tackle a simple yet non-trivial instantiation of the symbol grounding problem.

URL: https://openreview.net/forum?id=NW0uKe6IZa

---

Title: Does confidence calibration improve conformal prediction?

Abstract: Conformal prediction is an emerging technique for uncertainty quantification that constructs prediction sets guaranteed to contain the true label with a predefined probability. Previous works often employ temperature scaling to calibrate classifiers, assuming that confidence calibration benefits conformal prediction. However, the specific impact of confidence calibration on conformal prediction remains underexplored. In this work, we make two key discoveries about the impact of confidence calibration methods on adaptive conformal prediction. Firstly, we empirically show that current confidence calibration methods (e.g., temperature scaling) typically lead to larger prediction sets in adaptive conformal prediction. Secondly, by investigating the role of temperature value, we observe that high-confidence predictions can enhance the efficiency of adaptive conformal prediction. Theoretically, we prove that predictions with higher confidence result in smaller prediction sets on expectation. This finding implies that the rescaling parameters in these calibration methods, when optimized with cross-entropy loss, might counteract the goal of generating efficient prediction sets. To address this issue, we propose \textbf{Conformal Temperature Scaling} (ConfTS), a variant of temperature scaling with a novel loss function designed to enhance the efficiency of prediction sets. This approach can be extended to optimize the parameters of other post-hoc methods of confidence calibration. Extensive experiments demonstrate that our method improves existing adaptive conformal prediction methods in both image and text classification tasks.

URL: https://openreview.net/forum?id=6DDaTwTvdE

---

Title: Rethinking Robustness in Machine Learning: A Posterior Agreement Approach

Abstract: The robustness of algorithms against covariate shifts is a fundamental problem with critical implications for the deployment of machine learning algorithms in the real world. Current evaluation methods predominantly match the robustness definition to that of standard generalization, relying on standard metrics like accuracy-based scores, which, while designed for performance assessment, lack a theoretical foundation encompassing their application in estimating robustness to distribution shifts.

In this work, we set the desiderata for a robustness metric, and we propose a novel principled framework for the robustness assessment problem that directly follows the Posterior Agreement (PA) theory of model validation. Specifically, we extend the PA framework to the covariate shift setting by proposing a PA metric for robustness evaluation in supervised classification tasks. We assess the soundness of our metric in controlled environments and through an empirical robustness analysis in two different covariate shift scenarios: adversarial learning and domain generalization. We illustrate the suitability of PA by evaluating several models under different nature and magnitudes of shift, and proportion of affected observations. The results show that the PA metric provides a sensible and consistent analysis of the vulnerabilities in learning algorithms, even in the presence of few perturbed observations.

URL: https://openreview.net/forum?id=Bpc9uZ6kcg

---

Title: Leopard: A Vision Language Model for Text-Rich Multi- Image Tasks

Abstract: Text-rich images, where text serves as the central visual element guiding the overall understanding, are prevalent in real-world applications, such as presentation slides, scanned documents, and webpage snapshots. Tasks involving multiple text-rich images are especially challenging, as they require not only understanding the content of individual images but reasoning about inter-relationships and logical flows across multiple visual inputs. Despite the importance of these scenarios, current multimodal large language models (MLLMs) struggle to handle such tasks due to two key challenges: (1) the scarcity of high-quality instruction tuning datasets for text-rich multi-image scenarios, and (2) the difficulty in balancing image resolution with visual feature sequence length. To address these challenges, we propose Leopard, a MLLM designed specifically for handling vision-language tasks involving multiple text-rich images. First, we curated about one million high-quality multimodal instruction-tuning data, tailored to text-rich, multi-image scenarios. Second, we proposed an adaptive high-resolution multi-image encoding module to dynamically optimize the allocation of visual sequence length based on the original aspect ratios and resolutions of images. Experiments on a diverse set of benchmarks reveal that our model consistently outperforms state-of-the-art systems, such as Llama-3.2 and Qwen2-VL, in challenging text-rich, multi-image evaluations. Remarkably, our approach achieves outstanding performance using only 1.2M fully open-sourced training instances, outperforming models that rely on large-scale in-house data, highlighting its efficiency and effectiveness.
Our code and data are available at https://anonymous.4open.science/r/Leopard-908F.

URL: https://openreview.net/forum?id=R2rasAEPVi

---

Title: UnSTAR: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs

Abstract: Machine learning relies on data samples, learning techniques, and loss functions to optimize models. Similarly, machine unlearning can be envisioned using anti-data samples, unlearning techniques, and reversed loss functions. While unlearning techniques and reversed loss functions have been explored, the concept of unlearning with anti-data samples remains largely under-explored. In this paper, we introduce the first attempt for unlearning in large language models (LLMs) using anti-data samples. We propose a novel method for generating anti-samples for a given forget set, which effectively removes the influence of the forget set on the model while minimally impacting the retain set. Our results demonstrate that anti-data samples offer a promising pathway to achieve efficient and targeted unlearning in LLMs, providing a new direction for privacy-preserving machine learning and model modification.

URL: https://openreview.net/forum?id=mNXCViKZbI

---

Title: RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Abstract: We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self consistency TM-score ≥ 0.45, at which two RNAs have the same global fold. Open-source code: anonymous.4open.science/r/rna-backbone-design-53B5/

URL: https://openreview.net/forum?id=wOc1Yx5s09

---

Title: Generate to Discriminate: Expert Routing for Domain Incremental Learning

Abstract: In many real-world settings, regulations and economic incentives permit the sharing of models but not data across institutional boundaries. In such scenarios, practitioners might hope to adapt models to new domains, without losing performance on previous domains (so-called catastrophic forgetting). While any single model may struggle to achieve this goal, learning an ensemble of domain-specific experts offers the potential to adapt more closely to each individual institution. However, a core challenge in this context is determining which expert to deploy at test time. In this paper, we propose Generate to Discriminate (G2D), a domain-incremental learning method that leverages synthetic data to train a domain-discriminator that routes samples at inference time to the appropriate expert. Surprisingly, we find that leveraging synthetic data in this capacity is more effective than using the samples to \textit{directly} train the downstream classifier (the more common approach to leveraging synthetic data in the lifelong learning literature). We observe that G2D outperforms competitive domain-incremental learning methods on tasks in both vision and language modalities, providing a new perspective on the use of synthetic data in the lifelong learning literature.

URL: https://openreview.net/forum?id=QdQVfdXnsG

---

Title: On the Reproducibility of Vision Transformers Need Registers

Abstract: Training Vision Transformers (ViTs) presents significant challenges, one of which is the emergence of artifacts in attention maps, hindering their interpretability. Darcet et al. (2024) investigated this phenomenon and attributed it to the need for ViTs to store global information beyond the [CLS] token. They proposed a novel solution involving the addition of empty input tokens, named registers, which successfully eliminate artifacts and improve the clarity of attention maps. In this work, we reproduce the findings of Darcet et al. (2024) and evaluate the generalizability of their claims across multiple models, including DINO, DINOv2, OpenCLIP, and DeiT3. While we confirm the validity of several of their key claims, our results reveal that some claims do not extend universally to other models. Additionally, we explore the impact of model size, extending their findings to smaller models. Finally, we untie terminology inconsistencies found in the original paper and explain their impact when generalizing to a wider range of models.

URL: https://openreview.net/forum?id=4VonR2EPrf

---

Title: Learning Equivalence Classes of Bayesian Network Structures with GFlowNet

Abstract: Understanding the causal graph underlying a system is essential for enabling causal inference, particularly in fields such as medicine and genetics. Identifying a causal Directed Acyclic Graph (DAG) from observational data alone is challenging because multiple DAGs can encode the same set of conditional independencies, collectively represented by a Completed Partially Directed Acyclic Graph (CPDAG). Effectively approximating the CPDAG is crucial because it facilitates narrowing down the set of possible causal graphs underlying the data. We introduce CPDAG-GFN, a novel approach that uses a Generative Flow Network (GFlowNet) to learn a posterior distribution over CPDAGs. From this distribution, we can sample to create a set of plausible candidates that approximate the ground truth. This method focuses on sampling high-reward CPDAGs, with rewards determined by a score function that quantifies how well each graph fits the data. Additionally, it incorporates a sparsity-preferring filtering mechanism to enhance the produced set of CPDAGs. Experimental results on both simulated and real-world datasets demonstrate that CPDAG-GFN performs competitively with state-of-the-art methods for learning CPDAG candidates from observational data.

URL: https://openreview.net/forum?id=FAcc7oAdaa

---

Title: On the Reproducibility of: "Discovering and Mitigating Visual Biases through Keyword Explanations"

Abstract: This work aims to reproduce and extend the findings of "Discovering and Mitigating Visual Biases through Keyword Explanation" by Kim et al.(2024). The paper proposes the B2T framework, which detects and mitigates visual biases by extracting keywords from generated captions. By identifying biases in datasets, B2T contributes to the prevention of discriminatory behavior in vision-language models. We aim to investigate the five key claims from the original paper, namely that B2T (i) is able to identify whether a word represents a bias, (ii) can extract these keywords from captions of mispredicted images, (iii) outperforms other bias discovery models, (iv) can improve CLIP zero-shot prompting with the discovered keywords, and (v) identifies labeling errors in a dataset. To reproduce their results, we use the publicly available codebase and our re-implementations. Our findings confirm the first three claims and partially validate the fourth. We reject the fifth claim, due to the failure to identify pertinent labeling errors. Finally, we enhance the original work by optimizing the efficiency of the implementation, performing a bias analysis without a classifier, and assessing the generalizability of B2T on a new dataset.

URL: https://openreview.net/forum?id=5GS1q65pv6

---

Title: [Re] GNNBoundary: Towards explaining Graph Neural Networks through the lens of decision boundaries

Abstract: Graph Neural Networks (GNNs) have been successfully applied to machine-learning tasks for graph structured data. However, their decision-making process remains difficult to interpret. The GNNBoundary method proposed by Wang & Shen (2024) is a model-level explanation method designed to analyze GNN decision boundaries. This study aims to reproduce and verify the claims made in the original paper: (1) GNNBoundary method can identify the adjacent classes, (2) GNNBoundary method can generate faithful near-boundary graphs, and (3) these graphs can be used to analyze the decision boundary. Experiments were conducted on four datasets, including the Proteins dataset, which extends the original work. To reproduce the results, we followed the authors’ open-sourced implementation. Our findings only partially support Claim 1, due to variations found in adjacent classes. Generally, we were able to generate faithful near-boundary graphs, mostly supporting Claim 2. The boundary analysis differed from the original results, but it was in line with results for adjacent classes and confusion matrices, partially verifying Claim 3. Further support for this claim was found on Proteins through a PCA visualization of the data.

URL: https://openreview.net/forum?id=4mWTd8q5qM

---

Title: Spectral Clustering and Labeling for Crowdsourcing with Inherently Distinct Task Types

Abstract: The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups such that a worker has the same reliability for all tasks within a group. Our analysis reveals a separability condition such that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.

URL: https://openreview.net/forum?id=jVQjtzcvAc

---

Reply all

Reply to author

Forward

0 new messages