Weekly TMLR digest for Mar 02, 2025

10 views

Skip to first unread message

TMLR

unread,

Mar 2, 2025, 12:00:11 AMMar 2

to tmlr-annou...@googlegroups.com

New certifications
==================

Reproducibility Certification: Nomic Embed: Training a Reproducible Long Context Text Embedder

Zach Nussbaum, John Xavier Morris, Andriy Mulyar, Brandon Duderstadt

https://openreview.net/forum?id=IPmzyQSiQE

---

Accepted papers
===============

Title: Cycle Conditioning for Robust Representation Learning from Categorical Data

Authors: Mohsen Tabejamaat, Farzaneh Etminani, Mattias Ohlsson

Abstract: This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, general-purpose information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional "spelling" information to guide the conditioning process, ensuring that the conditioning sample remains influential during denoising. While this supervision enhances the generalizability of extracted representations, it is constrained by the sparse nature of spelling information in categorical data, leading to sparse latent conditions. This sparsity reduces the robustness of the extracted representations for downstream tasks or as effective guidance in the diffusion process. To overcome this challenge, we propose a linear navigation strategy within the latent space of conditioning samples, allowing dense representations to be extracted even with sparse supervision. Our experiments demonstrate that our method achieves at least a 1.42\% improvement in AUROC and a 4.12\% improvement in AUCPR over the best results from existing state-of-the-art methods.

URL: https://openreview.net/forum?id=GkYOcbNLaW

---

Title: A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems

Authors: Roozbeh Yousefzadeh, Xuenan Cao

Abstract: Using AI to write formal proofs for mathematical problems is a challenging task that has seen some advancements in recent years. Automated systems such as Lean can verify the correctness of proofs written in formal language, yet writing the proofs in formal language can be challenging for humans and machines. The miniF2F benchmark has 20 IMO problems in its test set, yet formal proofs are available only for 6 of these problems (3 of which are only written by mathematicians). The model with best accuracy can only prove 2 of these 20 IMO problems, from 1950s and 60s, while its training set is a secret. In this work, we write complete, original formal proofs for the remaining IMO problems in Lean along with 3 extra problems from IMO 2022 and 2023. This effort expands the availability of proof currently in the public domain by creating 5,880 lines of Lean proof. The goal of the paper is to pave the way for developing AI models that can automatically write the formal proofs for all the IMO problems in miniF2F and beyond by providing an evaluation benchmark. In this pursuit, we devise a method to decompose the proofs of these problems into their building blocks, constructing a dataset of 1,329 lemmas with more than 40k lines of Lean code. These lemmas are not trivial, yet they are approachable, providing the opportunity to evaluate and diagnose the failures and successes of AI models. We evaluate the ability of the SOTA LLMs on our dataset and analyze their success and failure modes from different perspectives. Our dataset and code is available at: https://github.com/roozbeh-yz/IMO-Steps.

URL: https://openreview.net/forum?id=CrKMqRAhBo

---

Title: Deep Active Learning in the Open World

Authors: Tian Xie, Jifan Zhang, Haoyue Bai, Robert D Nowak

Abstract: Machine learning models deployed in open-world scenarios often encounter unfamiliar conditions and perform poorly in unanticipated situations. As AI systems advance and find application in safety-critical domains, effectively handling out-of-distribution (OOD) data is crucial to building open-world learning systems. In this work, we introduce ALOE, a novel active learning algorithm for open-world environments designed to enhance model adaptation by incorporating new OOD classes via a two-stage approach. First, diversity sampling selects a representative set of examples, followed by energy-based OOD detection to prioritize likely unknown classes for annotation. This strategy accelerates class discovery and learning, even under constrained annotation budgets. Evaluations on three long-tailed image classification benchmarks demonstrate that ALOE outperforms traditional active learning baselines, effectively expanding known categories while balancing annotation cost. Our findings reveal a crucial tradeoff between enhancing known-class performance and discovering new classes, setting the stage for future advancements in open-world machine learning.

URL: https://openreview.net/forum?id=HkmymFPODz

---

Title: A Fused Gromov-Wasserstein Approach to Subgraph Contrastive Learning

Authors: Amadou Siaka SANGARE, Nicolas Dunou, Jhony H. Giraldo, Fragkiskos D. Malliaros

Abstract: Self-supervised learning has become a key method for training deep learning models when labeled data is scarce or unavailable. While graph machine learning holds great promise across various domains, the design of effective pretext tasks for self-supervised graph representation learning remains challenging. Contrastive learning, a popular approach in graph self-supervised learning, leverages positive and negative pairs to compute a contrastive loss function. However, current graph contrastive learning methods often struggle to fully use structural patterns and node similarities. To address these issues, we present a new method called Fused Gromov-Wasserstein Subgraph Contrastive Learning (FOSSIL). Our method integrates node-level and subgraph-level contrastive learning, seamlessly combining a standard node-level contrastive loss with the Fused Gromov-Wasserstein distance. This combination helps our method capture both node features and graph structure together. Importantly, our approach works well with both homophilic and heterophilic graphs and can dynamically create views for generating positive and negative pairs. Through extensive experiments on benchmark graph datasets, we show that FOSSIL outperforms or achieves competitive performance compared to current state-of-the-art methods.

URL: https://openreview.net/forum?id=J7cY9Jr9WM

---

Title: QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

Authors: Yilun Kong, Hangyu Mao, Zhao Qi, Bin Zhang, Jingqing Ruan, Li Shen, Yongzhe Chang, Xueqian Wang, Rui Zhao, Dacheng Tao

Abstract: Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLMs to obtain feedback for guiding the optimization process, incurring substantial redundant interaction costs. In this paper, we introduce Query-dependent Prompt Optimization ($\textbf{QPO}$), which leverages multi-loop offline reinforcement learning to iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries, thus significantly improving the prompting effect on the large target LLM. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks, thereby circumventing the expenses of online interactions. Furthermore, we continuously augment the offline dataset with the generated prompts in each loop, as the prompts from the fine-tuned model are supposed to outperform the source prompts in the original dataset. These iterative loops bootstrap the model towards generating optimal prompts. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.

URL: https://openreview.net/forum?id=bqMJToTkvT

---

Title: Combating Inter-Task Confusion and Catastrophic Forgetting by Metric Learning and Re-Using a Past Trained Model

Authors: Sayedmoslem Shokrolahi, IL MIN KIM

Abstract: Despite the vast research on class-incremental learning (IL), the critical issues have not yet been fully addressed. In this paper, utilizing metric learning, we tackle two fundamental issues of class-incremental learning (class-IL), inter-task confusion and catastrophic forgetting, which have not been fully addressed yet in the literature. To mitigate the inter-task confusion, we propose an innovative loss by utilizing the centroids of previously learned classes as negatives and current data samples as positives in the embedding space, which reduces overlaps between the classes of the current and past tasks in the embedding space. To combat catastrophic forgetting, we also propose that the past trained model is stored and re-used for generating past data samples for only one previous task. Based on this, we further propose a novel knowledge distillation approach utilizing inter-class embedding clusters, intra-class embedding clusters, and mean square embedding distances. Extensive experiments performed on MNIST, CIFAR-10, CIFAR-100, Mini-ImageNet, and TinyImageNet show that our proposed exemplar-free metric class-IL method achieves the state-of-the-art performance, beating all baseline methods by notable margins. We release our codes as the supplementary materials.

URL: https://openreview.net/forum?id=jRbKsQ3sYO

---

Title: AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

Authors: Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, Cihang Xie

Abstract: This paper studies the vulnerabilities of transformer-based Large Language Models (LLMs) to jailbreaking attacks, focusing specifically on the optimization-based Greedy Coordinate Gradient (GCG) strategy. We first observe a positive correlation between the effectiveness of attacks and the internal behaviors of the models. For instance, attacks tend to be less effective when models pay more attention to system prompts designed to ensure LLM safety alignment. Building on this discovery, we introduce an enhanced method that manipulates models' attention scores to facilitate LLM jailbreaking, which we term AttnGCG. Empirically, AttnGCG shows consistent improvements in attack efficacy across diverse LLMs, achieving an average increase of ~7% in the Llama-2 series and ~10% in the Gemma series. Our strategy also demonstrates robust attack transferability against both unseen harmful goals and black-box LLMs like GPT-3.5 and GPT-4. Moreover, we note our attention-score visualization is more interpretable, allowing us to gain better insights into how our targeted attention manipulation facilitates more effective jailbreaking. We release the code at https://github.com/UCSC-VLAA/AttnGCG-attack.

URL: https://openreview.net/forum?id=prVLANCshF

---

Title: Metalearning Continual Learning Algorithms

Authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

Abstract: General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF), i.e., previously acquired skills are forgotten when a new task is learned. Instead of hand-crafting new algorithms for avoiding CF, we propose Automated Continual Learning (ACL) to train self-referential neural networks to metalearn their own in-context continual (meta)learning algorithms. ACL encodes continual learning (CL) desiderata---good performance on both old and new tasks---into its metalearning objectives. Our experiments demonstrate that ACL effectively resolves "in-context catastrophic forgetting," a problem that naive in-context learning algorithms suffer from; ACL learned algorithms outperform both hand-crafted learning algorithms and popular meta-continual learning methods on the Split-MNIST benchmark in the replay-free setting, and enables continual learning of diverse tasks consisting of multiple standard image classification datasets. We also discuss the current limitations of in-context CL by comparing ACL with state-of-the-art CL methods that leverage pre-trained models. Overall, we bring several novel perspectives into the long-standing problem of CL.

URL: https://openreview.net/forum?id=IaUh7CSD3k

---

Title: Meta-Learning for Graphs with Heterogeneous Node Attribute Spaces for Few-Shot Edge Predictions

Authors: Zhong Chuang, Yusuke Tanaka, Tomoharu Iwata

Abstract: Prediction of edges between nodes in graph data is useful for many applications, such as social network analysis and knowledge graph completion. Existing graph neural network-based approaches have achieved notable advancements, but encounter significant difficulty in building an effective model when there is an insufficient number of known edges in graphs. Although some meta-learning approaches were introduced to solve this problem, having an assumption that the nodes of training graphs and test graphs are in homogeneous attribute spaces, which limits the flexibility of applications. In this paper, we proposed a meta-learning method for edge prediction that can learn from graphs with nodes in heterogeneous attribute spaces. The proposed model consists of attribute-wise message-passing networks that transform information between connected nodes for each attribute, resulting in attribute-specific node embeddings. The node embeddings are obtained by calculating the mean of the attribute-specific node embeddings.The encoding operation can be repeated multiple times to capture complex patterns. The attribute-wise message-passing networks are shared across all graphs, allowing knowledge transfer between different graphs.The probabilities of edges are estimated by the Euclidian distance between node embeddings. Experimental results on 14 real-world data sets demonstrate that the proposed method outperforms existing methods in edge prediction problems with sparse edge information.

URL: https://openreview.net/forum?id=CAkt3DsAZs

---

Title: Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics

Authors: Minttu Alakuijala, Reginald McLean, Isaac Woungang, Nariman Farsad, Samuel Kaski, Pekka Marttinen, Kai Yuan

Abstract: Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from substantial amounts of external observation-only data, and only the latter depends on a specific robot embodiment. To this end, we propose Video-Language Critic, a reward model that can be trained on readily available cross-embodiment data using contrastive learning and a temporal ranking objective, and use it to score behavior traces from a separate actor. When trained on Open X-Embodiment data, our reward model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only, despite a significant domain gap. Using in-domain data but in a challenging task generalization setting on Meta-World, we further demonstrate more sample-efficient training than is possible with prior language-conditioned reward models that are either trained with binary classification, use static images, or do not leverage the temporal information present in video data.

URL: https://openreview.net/forum?id=jJOVpnNrEp

---

Title: DivIL: Unveiling and Addressing Over-Invariance for Out-of- Distribution Generalization

Authors: Jiaqi WANG, Yuhang Zhou, Zhixiong Zhang, Qiguang Chen, Yongqiang Chen, James Cheng

Abstract: Out-of-distribution generalization is a common problem that expects the model to perform well in the different distributions even far from the train data. A popular approach to addressing this issue is invariant learning (IL), in which the model is compiled to focus on invariant features instead of spurious features by adding strong constraints during training. However, there are some potential pitfalls of strong invariant constraints. Due to the limited number of diverse environments and over-regularization in the feature space, it may lead to a loss of important details in the invariant features while alleviating the spurious correlations, namely the over-invariance, which can also degrade the generalization performance. We theoretically define the over-invariance and observe that this issue occurs in various classic IL methods. To alleviate this issue, we propose a simple approach Diverse Invariant Learning (DivIL) by adding the unsupervised contrastive learning and the random masking mechanism compensatory for the invariant constraints, which can be applied to various IL methods. Furthermore, we conduct experiments across multiple modalities across 12 datasets and 6 classic models, verifying our over-invariance insight and the effectiveness of our DivIL framework. Our code is available at https://github.com/kokolerk/DivIL.

URL: https://openreview.net/forum?id=2Zan4ATYsh

---

Title: FaAlGrad: Fairness through Alignment of Gradients across Different Subpopulations

Authors: Nikita Malik, Konda Reddy Mopuri

Abstract: The growing deployment of Machine Learning systems has increased interest in systems optimized for other important criteria along with the expected task performance. For instance, machine learning models often exhibit biases that lead to unfair outcomes for certain protected subpopulations. This work aims to handle the bias in machine learning models and enhance their fairness by aligning the loss gradients. Specifically, leveraging the meta-learning technique, we propose a novel training framework that aligns the gradients computed across different subpopulations for learning fair classifiers. Aligning the gradients enables our framework to regularize the training process, thereby prioritizing fairness over predictive accuracy. Our experiments on multiple benchmark datasets demonstrate significant improvements in fairness metrics without having any exclusive regularizers for fairness. Thus our work contributes to developing fairer machine learning models with broader societal benefits.

URL: https://openreview.net/forum?id=k4AxEwTaHq

---

Title: Faster Diffusion Through Temporal Attention Decomposition

Authors: Haozhe Liu, Wentian Zhang, Jinheng Xie, Francesco Faccio, Mengmeng Xu, Tao Xiang, Mike Zheng Shou, Juan-Manuel Perez-Rua, Jürgen Schmidhuber

Abstract: We explore the role of the attention mechanism during inference in text-conditional diffusion models. Empirical observations suggest that cross-attention outputs converge to a fixed point after several inference steps. The convergence time naturally divides the entire inference process into two phases: an initial phase for planning text-oriented visual semantics, which are then translated into images in a subsequent fidelity-improving phase. Cross-attention is essential in the initial phase but almost irrelevant thereafter. Self-attention, however, initially plays a minor role but becomes increasingly important in the second phase. These findings yield a simple and training-free method called TGATE which efficiently generates images by caching and reusing attention outputs at scheduled time steps. Experiments show TGATE’s broad applicability to various existing text-conditional diffusion models which it speeds up by 10-50%. The code of TGATE is available at https://github.com/HaozheLiu-ST/T-GATE.

URL: https://openreview.net/forum?id=xXs2GKXPnH

---

Title: TACO Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

Authors: Denis Kuznedelev, Soroush Tabesh, Kimia Noorbakhsh, Elias Frantar, Sara Beery, Eldar Kurtic, Dan Alistarh

Abstract: Recent vision architectures and self-supervised training methods have enabled training computer vision models that are extremely accurate, but come with massive computational costs. In settings such as identifying species in camera traps in the field, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. Such users may still wish to make use of highly-accurate large models, but are often constrained by the computational cost. To address this, we ask: can we quickly compress generalist models into accurate and efficient specialists given a small amount of data?

Towards this goal, we propose a simple and versatile technique, which we call Few-Shot Task-Aware COmpression (TACO). Given a general-purpose model pretrained on a broad task, such as classification on ImageNet or iNaturalist datasets with thousands of categories, TACO produces a much smaller model that is accurate on specialized tasks, such as classifying across vehicle types or animal species, based only on a few examples from each target class. The method is based on two key insights - 1) a powerful specialization effect for data-aware compression, which we showcase for the first time; 2) a dedicated finetuning procedure with knowledge distillation, which prevents overfitting even in scenarios where data is very scarce. Specifically, TACO is applied in few-shot fashion, i.e. only a few task-specific samples are used for compression, and the procedure has low computational overhead. We validate this approach experimentally using highly-accurate ResNet, ViT/DeiT, and ConvNeXt models, originally trained on ImageNet and iNaturalist datasets, which we specialize and compress to a diverse set of ``downstream'' subtasks, with notable computational speedups on both CPU and GPU.

URL: https://openreview.net/forum?id=Za9Tm07fig

---

Title: Global Convergence Rate of Deep Equilibrium Models with General Activations

Authors: Lan V. Truong

Abstract: In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. This paper shows that this fact still holds for DEQs with any general activation that has bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we need to create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.

URL: https://openreview.net/forum?id=XPREcQlAM0

---

Title: Nomic Embed: Training a Reproducible Long Context Text Embedder

Authors: Zach Nussbaum, John Xavier Morris, Andriy Mulyar, Brandon Duderstadt

Abstract: This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on the short-context MTEB benchmark and the long context LoCo benchmark.
We release the training code and model weights under an Apache 2.0 license.
In contrast with other open-source models, we release the full curated training data and code that allows for full replication of nomic-embed-text-v1. You can find code and data to replicate the model at \href{https://github.com/nomic-ai/contrastors}{https://github.com/nomic-ai/contrastors}

URL: https://openreview.net/forum?id=IPmzyQSiQE

---

Title: Stability-Aware Training of Machine Learning Force Fields with Differentiable Boltzmann Estimators

Authors: Sanjeev Raja, Ishan Amin, Fabian Pedregosa, Aditi S. Krishnapriyan

Abstract: Machine learning force fields (MLFFs) are an attractive alternative to ab-initio methods for molecular dynamics (MD) simulations. However, they can produce unstable simulations, limiting their ability to model phenomena occurring over longer timescales and compromising the quality of estimated observables. To address these challenges, we present Stability-Aware Boltzmann Estimator (StABlE) Training, a multi-modal training procedure which leverages joint supervision from reference quantum-mechanical calculations and system observables. StABlE Training iteratively runs many MD simulations in parallel to seek out unstable regions, and corrects the instabilities via supervision with a reference observable. We achieve efficient end-to-end automatic differentiation through MD simulations using our Boltzmann Estimator, a generalization of implicit differentiation techniques to a broader class of stochastic algorithms. Unlike existing techniques based on active learning, our approach requires no additional ab-initio energy and forces calculations to correct instabilities. We demonstrate our methodology across organic molecules, tetrapeptides, and condensed phase systems, using three modern MLFF architectures. StABlE-trained models achieve significant improvements in simulation stability, data efficiency, and agreement with reference observables. Crucially, the stability improvements cannot be matched by simply reducing the simulation timestep, meaning that StABlE Training effectively allows for larger timesteps in MD simulations. By incorporating observables into the training process alongside first-principles calculations, StABlE Training can be viewed as a general semi-empirical framework applicable across MLFF architectures and systems. This makes it a powerful tool for training stable and accurate MLFFs, particularly in the absence of large reference datasets. Our code is publicly available at https://github.com/ASK-Berkeley/StABlE-Training.

URL: https://openreview.net/forum?id=ZckLMG00sO

---

Title: Fair principal component analysis (PCA): minorization-maximization algorithms for Fair PCA, Fair Robust PCA and Fair Sparse PCA

Authors: Prabhu babu, Petre Stoica, Astha Saini

Abstract: In this paper we propose a new iterative algorithm to solve the fair PCA (FPCA) problem. We start with the max-min fair PCA formulation originally proposed in \cite{samadi1} and derive a simple and efficient iterative algorithm which is based on the minorization-maximization (MM) approach. The proposed algorithm relies on the relaxation of a semi-orthogonality constraint which is proved to be tight at every iteration of the algorithm. The vanilla version of the proposed algorithm requires solving a semi-definite program (SDP) at every iteration, which can be further simplified to a quadratic program by formulating the dual of the surrogate maximization problem. We also propose two important reformulations of the fair PCA problem: a) fair robust PCA - which can handle outliers in the data, and b) fair sparse PCA - which can enforce sparsity on the estimated fair principal components.
The proposed algorithms are computationally efficient and monotonically increase their respective design objectives at every iteration. An added feature of the proposed algorithms is that they do not require the selection of any hyperparameter (except for the fair sparse PCA case where a penalty parameter that controls the sparsity has to be chosen by the user). We numerically compare the performance of the proposed methods with two of the state-of-the-art approaches on synthetic data sets and real-life data sets.

URL: https://openreview.net/forum?id=6jTQrr3APY

---

Title: Producers Equilibria and Dynamics in Engagement-Driven Recommender Systems

Authors: Krishna Acharya, Juba Ziani, Jingyan Wang, Varun Vangala

Abstract: Online platforms such as YouTube, Instagram heavily rely on recommender systems to decide what content to present to users. Producers, in turn, often create content that is likely to be recommended to users and have users engage with it. To do so, producers try to align their content with the preferences of their targeted user base. In this work, we explore the equilibrium behavior of producers who are interested in maximizing user engagement. We study two variants of the content-serving rule for the platform's recommender system, and provide a structural characterization of producer behavior at equilibrium: namely, each producer chooses to focus on a single embedded feature. We further show that specialization, defined as different producers optimizing for distinct types of content, naturally emerges from the competition among producers trying to maximize user engagement. We provide a heuristic for computing equilibria of our engagement game, and evaluate it experimentally. We highlight i) the performance and convergence of our heuristic, ii) the degree of producer specialization, and iii) the impact of the content-serving rule on producer and user utilities at equilibrium and provide guidance on how to set the content-serving rule.

URL: https://openreview.net/forum?id=EWT4GxjGDS

---

Title: Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Authors: Zeyu Yang, Han Yu, Peikun Guo, Khadija Zanna, Xiaoxue Yang, Akane Sano

Abstract: Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data, which may influence discriminatory actions. In this research, we introduce a novel tabular diffusion model that incorporates sensitive guidance to generate fair synthetic data with balanced joint distributions of the target label and sensitive attributes, such as sex and race.
The empirical results demonstrate that our method effectively mitigates bias in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that our approach outperforms existing methods for synthesizing tabular data on fairness metrics such as demographic parity ratio and equalized odds ratio, achieving improvements of over $10\%$. Our implementation is available at https://github.com/comp-well-org/fair-tab-diffusion.

URL: https://openreview.net/forum?id=dvRysCqmYQ

---

Title: A Neural Material Point Method for Particle-based Emulation

Authors: Omer Rochman-Sharabi, Sacha Lewin, Gilles Louppe

Abstract: Mesh-free Lagrangian methods are widely used for simulating fluids, solids, and their complex interactions due to their ability to handle large deformations and topological changes. These physics simulators, however, require substantial computational resources for accurate simulations. To address these issues, deep learning emulators promise faster and scalable simulations, yet they often remain expensive and difficult to train, limiting their practical use. Inspired by the Material Point Method (MPM), we present NeuralMPM, a neural framework for particle-based emulation. NeuralMPM interpolates Lagrangian particles onto a fixed-size grid, computes updates on grid nodes using image-to-image neural networks, and interpolates back to the particles. Similarly to MPM, NeuralMPM benefits from the regular voxelized representation to simplify the computation of the state dynamics, while avoiding the drawbacks of mesh-based Eulerian methods. We demonstrate the advantages of NeuralMPM on 6 datasets, including fluid dynamics and fluid-solid interactions simulated with MPM and Smoothed Particles Hydrodynamics (SPH). Compared to GNS and DMCF, NeuralMPM reduces training time from 10 days to 15 hours, memory consumption by 10x-100x, and increases inference speed by 5x-10x, while achieving comparable or superior long-term accuracy, making it a promising approach for practical forward and inverse problems. A project page is available at https://neuralmpm.isach.be/.

URL: https://openreview.net/forum?id=zSK81A2hxQ

---

Title: Lognormal Mutations and their Use in Detecting Surreptitious Fake Images

Authors: Olivier Teytaud, Mariia Zameshina, Tom Sander, Pierre Fernandez, Furong Ye, Laurent Najman, Thomas Bäck, Ismail Labiad

Abstract: In many cases, adversarial attacks against fake detectors employ algorithms specifically crafted for automatic image classifiers.
These algorithms perform well, thanks to an excellent ad hoc distribution of initial attacks.
However, these attacks are easily detected due to their specific initial distribution. Consequently, we explore alternative black-box attacks inspired by generic black-box optimization tools, particularly focusing on the \lognormal{} algorithm that we successfully extend to attack fake detectors.
Moreover, we demonstrate that this attack evades detection by neural networks trained to flag classical adversarial examples.
Therefore, we train more general models capable of identifying a broader spectrum of attacks, including classical black-box attacks designed for images, black-box attacks driven by classical optimization, and no-box attacks.
By integrating these attack detection capabilities with fake detectors, we develop more robust and effective fake detection systems.

URL: https://openreview.net/forum?id=0RJvZY0h6O

---

Title: Verbalized Machine Learning: Revisiting Machine Learning with Language Models

Authors: Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu

Abstract: Motivated by the progress of large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning (ML) models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation, where an LLM with a text prompt can be viewed as a function parameterized by the text prompt. Guided by this perspective, we revisit classical ML problems, such as regression and classification, and find that these problems can be solved by an LLM-parameterized learner and optimizer. The major advantages of VML include (1) easy encoding of inductive bias: prior knowledge about the problem and hypothesis class can be encoded in natural language and fed into the LLM-parameterized learner; (2) automatic model class selection: the optimizer can automatically select a model class based on data and verbalized prior knowledge, and it can update the model class during training; and (3) interpretable learner updates: the LLM-parameterized optimizer can provide explanations for why an update is performed. We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.

URL: https://openreview.net/forum?id=k3Ab6RuJE9

---

Title: Counterfactual Learning of Stochastic Policies with Continuous Actions

Authors: Houssam Zenati, Alberto Bietti, Matthieu Martin, Eustache Diemert, Pierre Gaillard, Julien Mairal

Abstract: Counterfactual reasoning from logged data has become increasingly important for many applications such as web advertising or healthcare. In this paper, we address the problem of learning stochastic policies with continuous actions from the viewpoint of counterfactual risk minimization (CRM). While the CRM framework is appealing and well studied for discrete actions, the continuous action case raises new challenges about modelization, optimization, and~offline model selection with real data which turns out to be particularly challenging. Our paper contributes to these three aspects of the CRM estimation pipeline.
First, we introduce a modelling strategy based on a joint kernel embedding of contexts and actions, which overcomes the shortcomings of previous discretization approaches. Second, we empirically show that the optimization aspect of counterfactual learning is important, and we demonstrate the benefits of proximal point algorithms and smooth estimators. Finally, we propose an evaluation protocol for offline policies in real-world logged systems, which is challenging since policies cannot be replayed on test data, and we release a new large-scale dataset along with multiple synthetic, yet realistic, evaluation setups.

URL: https://openreview.net/forum?id=fC4bh1PmZr

---

Title: Why is constrained neural language generation particularly challenging?

Authors: Cristina Garbacea, Qiaozhu Mei

Abstract: Recent advances in deep neural language models combined with the capacity of large scale datasets have accelerated the development of natural language generation systems that produce fluent and coherent texts (to various degrees of success) in a multitude of tasks and application contexts. However, controlling the output of these models for desired user and task needs is still an open challenge. This is crucial not only to customizing the content and style of the generated language, but also to their safe and reliable deployment in the real world. We present an extensive survey on the emerging topic of constrained neural language generation in which we formally define and categorize the problems of natural language generation by distinguishing between conditions and constraints (the latter being testable conditions on the output text instead of the input), present constrained text generation tasks, and review existing methods and evaluation metrics for constrained text generation. Our aim is to highlight recent progress and trends in this emerging field, informing on the most promising directions and limitations towards advancing the state-of-the-art of constrained neural language generation research.

URL: https://openreview.net/forum?id=Vwgjk5ysWn

---

New submissions
===============

Title: Reproducibility Study of GNNBoundary: Towards Explain- ing Graph Neural Networks through the Lens of Decision Boundaries

Abstract: This study reproduces and extends GNNBoundary, a method for explaining Graph Neural Networks (GNNs) by analyzing decision boundaries between graph classes. GNNBoundary identifies adjacent class pairs and generates boundary graphs to provide insights into model behavior. We evaluate the reproducibility of key claims from the original work, including the identification of adjacent classes, the generation of accurate boundary graphs, and the effectiveness of an adaptive loss function in achieving faster convergence. Besides partly generating successful boundary graphs, our reproduction mostly highlights challenges with training variability and convergence, particularly with the Enzymes dataset. This suggests that GNNBoundary’s performance is sensitive to hyperparameter settings and random initialization. In addition, we extend GNNBoundary to handle three-class decision boundaries. While it demonstrated its feasibility, it also highlighted limitations in achieving balanced class separability and convergence. By assessing the abilities of GNNBoundary and the extension, this study contributes to improving the transparency and interpretability of GNN decision boundaries. Our findings emphasize the need for refined loss functions, additional baseline comparisons, and methodological extensions to more complex datasets for improved reliability.

URL: https://openreview.net/forum?id=9BbK3iURyB

---

Title: Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum

Abstract: Federated Learning (FL) has emerged as the state-of-the-art approach for learning from decentralized data in privacy-constrained scenarios.
However, system and statistical challenges hinder its real-world applicability, requiring efficient learning from edge devices and robustness to data heterogeneity. Despite significant research efforts, existing approaches often degrade severely due to the joint effect of heterogeneity and partial client participation. In particular, while momentum appears as a promising approach for overcoming statistical heterogeneity, in current approaches its update is biased towards the most recently sampled clients. As we show in this work, this is the reason why it fails to outperform FedAvg, preventing its effective use in real-world large-scale scenarios.
In this work, we propose a novel _Generalized Heavy-Ball Momentum_ (GHBM) and theoretically prove it enables convergence under unbounded data heterogeneity in _cyclic partial participation_, thereby advancing the understanding of momentum's effectiveness in FL.
We then introduce adaptive and communication-efficient variants of GHBM that match the communication complexity of FedAvg in settings where clients can be _stateful_.
Extensive experiments on vision and language tasks confirm our theoretical findings, demonstrating that GHBM substantially improves state-of-the-art performance under random uniform client sampling, particularly in large-scale settings with high data heterogeneity and low client participation.

URL: https://openreview.net/forum?id=LNoFjcLywb

---

Title: Unmasking Trees for Tabular Data

Abstract: Despite much work on advanced deep learning and generative modeling techniques for tabular data generation and imputation, traditional methods have continued to win on imputation benchmarks. We herein present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees which are used to incrementally unmask individual features. This approach offers state-of-the-art performance on imputation and on generation given training data with missingness; and it has competitive performance on vanilla generation. To solve the conditional generation subproblem, we propose a tabular probabilistic prediction method, BaltoBot, which fits a balanced tree of boosted tree classifiers. Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions; unlike newer diffusion methods, it offers fast sampling, closed-form density estimation, and flexible handling of discrete variables. We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.

URL: https://openreview.net/forum?id=0AxbTF3Ouq

---

Title: Deep Augmentation: Dropout as Augmentation for Self-Supervised Learning

Abstract: Despite dropout’s ubiquity in machine learning, its effectiveness as a form of data augmentation remains under-explored. We address two key questions: (i) When is dropout effective as an augmentation strategy? (ii) Is dropout uniquely effective under these conditions? To explore these questions, we propose Deep Augmentation, a network- and modality-agnostic method that applies dropout or PCA transformations to targeted layers in neural networks. Through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning, we find that uniformly applying dropout across layers does not consistently improve performance. Instead, dropout proves most beneficial in deeper layers and can be matched by alternative augmentations (e.g., PCA). We also show that a stop-gradient operation is critical for ensuring dropout functions effectively as an augmentation, and that performance trends invert when moving from contrastive tasks to supervised tasks. Our analysis suggests that Deep Augmentation helps mitigate inter-layer co-adaptation---a notable issue in self-supervised learning due to the absence of labeled data. Drawing on these insights, we outline a procedure for selecting the optimal augmentation layer and demonstrate that Deep Augmentation can outperform traditional input-level augmentations. This simple yet powerful approach can be seamlessly integrated into a wide range of architectures and modalities, yielding notable gains in both performance and generalization.

URL: https://openreview.net/forum?id=OjWB2671AR

---

Title: Reproducibility Study of “Vision Transformers Need Registers”

Abstract: Vision Transformers (ViTs) have achieved State-Of-The-Art (SOTA) performance in nu-
merous tasks. However, the emergence of high-norm artifact tokens in supervised and
self-supervised ViTs hinders interpretability of attention maps of such models. This study
reproduces and validates previous work addressing this issue through the use of register
tokens - learnable placeholders added to the input sequence - that mitigate artifacts and
yield smoother feature maps. We evaluated the presence of artifacts in various ViT models,
namely DeiT-III and DINOv2 architectures, and investigated the impact of fine-tuning pre-
trained ViTs with register tokens and additional regularization introduced. By conducting
experiments on pre-trained and fine-tuned models, we confirm that register tokens eliminate
artifact and improve attention map interpretability.

URL: https://openreview.net/forum?id=w9pgM58H05

---

Title: Entropy Voting Between Capsules

Abstract: Capsule networks offer a promising solution in computer vision by addressing the limitations of convolutional neural networks (CNNs), such as data dependency and viewpoint challenges. Unlike CNNs, capsules reduce the need for data augmentation by enhancing generalization from limited training data. We explore capsules from the perspective of information theory, viewing them as continuous random variables. We use marginal differential entropy to measure the information content of capsules, and relative entropy to model the agreement between lower-level and higher-level capsules. The proposed entropy voting method aims to maximize capsule marginal entropies and to minimize their relative entropy. We show through an ablation study that such a relationship exists between the capsules. We also show that our approach performs better or comparably against state-of-the-art capsule networks while significantly improving inference time. This research highlights the synergy between capsules and information theory, providing insights into their combined potential.

URL: https://openreview.net/forum?id=pFhPaXPsky

---

Title: MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training

Abstract: Mathematical formulas are a fundamental and widely used component in various scientific fields, serving as a universal language for expressing complex concepts and relationships.
While state-of-the-art transformer models excel in processing and understanding natural language, they encounter challenges with mathematical notation, which involves a complex structure and diverse representations.
This study focuses on the development of specialized training datasets to enhance the encoding of mathematical content.
We introduce Math Mutator (MAMUT), a framework capable of generating equivalent and falsified versions of a given mathematical formula in LaTeX notation, effectively capturing the mathematical variety in notation of the same concept.
Based on MAMUT, we have generated four large mathematical datasets containing diverse notation, which can be used to train language models with enhanced mathematical embeddings.

URL: https://openreview.net/forum?id=khODmRpQEx

---

Title: Multi-objective Bayesian optimization for Likelihood-Free inference in sequential sampling models of decision making

Abstract: Statistical models are often defined by a generative process for simulating synthetic data, but this can lead to intractable likelihoods. Likelihood free inference (LFI) methods enable Bayesian inference to be performed in this case. Extending a popular approach to simulation-efficient LFI for single-source data, we propose Multi-objective Bayesian Optimization for Likelihood Free Inference (MOBOLFI) to perform LFI using multi-source data. MOBOLFI models a multi-dimensional discrepancy between observed and simulated data, using a separate discrepancy for each data source. The use of a multivariate discrepancy allows for approximations to individual data source likelihoods in addition to the joint likelihood, enabling detection of conflicting information and deeper understanding of the importance of different data sources in estimating individual parameters. The adaptive choice of simulation parameters using multi-objective Bayesian optimization ensures simulation efficient approximation of likelihood components for all data sources. We illustrate our approach in sequential sampling models (SSMs), which are widely used in psychology and consumer-behavior modeling. SSMs are often fitted using multi-source data, such as choice and response time. The advantages of our approach are illustrated in comparison with a single discrepancy for an SSM fitted to data assessing preferences of ride-hailing drivers in Singapore to rent electric vehicles.

URL: https://openreview.net/forum?id=hQjwDqfSzj

---

Title: Shrek MCMC: A Multi-Fidelity Layered MCMC Approach

Abstract: Markov chain Monte Carlo (MCMC) requires only the ability to evaluate the likelihood, making it a common technique for inference in complex models. However, it can have a slow mixing rate, requiring the generation of many samples to obtain good estimates and an overall high computational cost. Shrek MCMC is a multi-fidelity layered MCMC method that exploits lower-fidelity approximations of the true likelihood calculation to improve mixing and leads to overall faster performance. Such lower-fidelity likelihoods are commonly available in scientific and engineering applications where the model involves a simulation whose resolution or accuracy can be tuned. Our technique uses recursive, layered chains with simple layer tuning; it does not require the likelihood to take any form or have any particular internal mathematical structure. We demonstrate experimentally that Shrek MCMC achieves larger effective sample sizes for the same computational time across different scientific domains including hydrology and cosmology.

URL: https://openreview.net/forum?id=QTxywlorZ1

---

Title: To Be Greedy, or Not to Be – That Is the Question for Population Based Training Variants

Abstract: Achieving excellent results with neural networks requires careful hyperparameter tuning, which can be automated via hyperparameter optimization algorithms such as Population Based Training (PBT). PBT stands out for its capability to efficiently optimize hyperparameter schedules in parallel and within the wall-clock time of training a single network. Several PBT variants have been proposed that improve performance in the experimental settings considered in the associated publications. However, the experimental settings and tasks vary across publications, while the best previous PBT variant is not always included in the comparisons, thus making the relative performance of PBT variants unclear. In this work, we empirically evaluate five single-objective PBT variants on a set of image classification and reinforcement learning tasks with different setups (such as increasingly large search spaces). We find that the Bayesian Optimization (BO) variants of PBT tend to behave greedier than the non-BO ones, which is beneficial when aggressively pursuing short-term gains improves long-term performance and harmful otherwise. This is a previously overlooked caveat to the reported improvements of the BO PBT variants. Examining their theoretical properties, we find that BO PBT variants are guaranteed to asymptotically approach the greedy hyperparameter schedule (rather than the optimal one, as claimed in prior work). Together with our empirical results, this leads us to conclude that there is currently no single best PBT variant capable of outperforming others both when pursuing short-term gains is helpful in the long term, and when it is harmful.

URL: https://openreview.net/forum?id=3qmnxysNbi

---

Title: Reproducibility Study of Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation

Abstract: This paper presents a reproducibility study and extension of "Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation." We validate the original findings using a range of open-weight models (1.5B-70B parameters) and GPT-4o Mini while introducing several novel contributions. We analyze the Pareto front of the games, propose a communication-free baseline to test whether successful negotiations are possible without agent interaction, evaluate recent small language models' performance, analyze structural information leakage in model responses, and implement an inequality metric to assess negotiation fairness. Our results demonstrate that smaller models (<10B parameters) struggle with format adherence and coherent responses, but larger open-weight models can approach proprietary model performance. Additionally, in many scenarios, single-agent approaches can achieve comparable results to multi-agent negotiations, challenging assumptions about the necessity of agent communication to perform well on the benchmark. This work also provides insights into accessibility, fairness, environmental impact, and privacy considerations of LLM-based negotiation systems.

URL: https://openreview.net/forum?id=MTrhFmkC45

---

Title: [RE] GNNBoundary: Finding Boundaries and Going Beyond Them

Abstract: Graph classification models are becoming increasingly popular, while explainability methods face challenges due to the discrete nature of graphs and other factors. However, investigating model decision-making, such as through decision-boundary regions, helps prevent
misclassification and improve model robustness. This study aims to reproduce the findings of GNNBoundary: Towards Explaining Graph Neural Networks Through the Lens of Decision Boundaries (Wang & Shen, 2024). Their work supports 3 main claims: (1) their proposed algorithm can identify adjacent class pairs reliably, (2) their GNNBoundary can effectively and consistently generate near-boundary graphs outperforming the cross entropy baseline and (3) the generated near-boundary graphs can be used to accurately assess key properties of the decision boundary; margin, thickness, and complexity. We reproduce the experiments on the same datasets and extended them to two additional real-world datasets. Beyond that, we test different boundary probability ranges and their effect on decision boundary metrics, develop an additional baseline, and conduct hyperparameter tuning. We confirm the first claim regarding the adjacency discovery as well as the second claim that GNNBoundary outperforms the cross-entropy baseline under the limitation that it requires intensive hyperparameter tuning for convergence. The third claim is partially accepted as we observe a high variance between reported and obtained results, disproving the reliability and precision of the boundary statistics.

URL: https://openreview.net/forum?id=kEUvWFHEsn

---

Title: Efficient Exploration in Multi-Agent Reinforcement Learning via Farsighted Self-Direction

Abstract: Multi-agent reinforcement learning faces greater challenges with efficient exploration compared to single-agent counterparts, primarily due to the exponential growth in state and action spaces. Methods based on intrinsic rewards have been proven to enhance exploration efficiency in multi-agent scenarios effectively. However, these methods are plagued by instability during training and biases in exploration direction. To address these challenges, we propose Farsighted Self-Direction (FSD), a novel model-free method that utilizes a long-term exploration bonus to achieve coordinated exploration. Since prediction error against individual Q-values indicates a potential bonus for committed exploration, it is taken into account in action selection to directly guide the coordinated exploration. Further, we also use clipped double Q-learning to reduce noise in prediction error. We validate the method on didactic examples and demonstrate the outperformance of our method on challenging StarCraft II micromanagement tasks.

URL: https://openreview.net/forum?id=NUV8THrLZC

---

Title: Impact of Language Guidance: A Reproducibility Study

Abstract: Modern deep-learning architectures need large amounts of data to produce state-of-the-art results. Annotating such huge datasets is time-consuming, expensive, and prone to human error. Recent advances in self-supervised learning allow us to train huge models without explicit annotation. Contrastive learning is a popular paradigm in self-supervised learning. Recent works like SimCLR and CLIP rely on image augmentations or directly minimizing cross-modal loss between image and text. Banani et al. (2023) propose to use language guidance to sample view pairs. They claim that language enables better conceptual similarity, eliminating the effects of visual variability. We reproduce their experiments to verify their claims. We find that their dataset, RedCaps, contains low-quality captions. We use an off-the-shelf image captioning model, BLIP-2, to replace the captions and improve performance. We also devise a new metric to evaluate the semantic capabilities of self-supervised models based on interpretability methods.

URL: https://openreview.net/forum?id=qTDDGHvXiU

---

Title: Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models

Abstract: In this paper, we conduct a comprehensive reproducibility study of An Image is Worth 1000 Lies: Adversarial Transferability Across Prompts on Vision-Language Models. Beyond replicating the original Cross-Prompt Attack (CroPA) method, we identify key limitations and propose enhancements to improve its effectiveness. Our key contributions include: (1) Two novel initialization strategies that significantly improve Attack Success Rate (ASR) and transferability (2) a refined loss function that manipulates the vision encoder’s attention mechanisms to improve generalization and (3) a broader evaluation by benchmarking CroPA against multiple robust attack baselines. We evaluate our approach across a range of prevalent VLMs, including Flamingo, BLIP-2, and InstructBLIP, validate the original results while demonstrating consistent improvements. Our work reinforces the importance of studying adversarial vulnerabilities in VLMs and provides a more robust and versatile framework for generating transferable adversarial examples, with significant implications for understanding and improving the security of VLMs in real-world applications.

URL: https://openreview.net/forum?id=5L90cl0xtf

---

Title: On the Generalizability of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals"

Abstract: We present a reproduction study of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals" (Ortu et al., 2024), which investigates competition of mechanisms in language models between factual recall and counterfactual in-context repetition. Our study successfully reproduces their primary findings regarding the localization of factual and counterfactual information, the dominance of attention blocks in mechanism competition, and the specialization of attention heads in handling competing information. We reproduce their results on both GPT-2 (Radford et al., 2019) and Pythia 6.9B (Biderman et al., 2023). We extend their work in three significant directions: First, we demonstrate that these findings generalize to even larger models by replicating the experiments on Llama 3.1 8B (Dubey et al., 2024). Second, we investigate the impact of prompt structure by introducing variations where we avoid repeating the counterfactual statement verbatim or we change the premise word, observing a marked decrease in the logit for the counterfactual token. Finally, we investigate the validity of the authors’ claims for prompts of specific domains, discovering that certain categories of prompts skew the results by providing the factual prediction token as part of the subject of the sentence. We find that the attention head ablation proposed in Ortu et al. (2024) is ineffective for domains that are underrepresented in their dataset, and that the effectiveness varies based on domain,
prompt structure and task.

URL: https://openreview.net/forum?id=15keyzQj9h

---

Title: Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention

Abstract: This study is part of the MLRC Reproducibility Challenge 2025, aiming to reproduce and improve the results from a NeurIPS 2024 submission \textit{Smoothed Energy Guidance (SEG): Guiding Diffusion Models with Reduced Energy Curvature of Attention}. The work proposed in the SEG paper faced key limitations, including the lack of an ablation study for optimal kernel size selection and unexplored alternative blurring strategies within diffusion models, which could offer valuable insights into enhancing image quality and model robustness. Furthermore, the approach employed unnecessary smoothing throughout all iterations of the denoising process, which not only diminished the clarity of the output but also resulted in increased computational costs. To address these issues, we conducted a detailed ablation study and explored more efficient alternatives, including Exponential Moving Average (EMA) and BoxBlur using integral images, to improve computational efficiency while maintaining image quality. Our findings provide insights into optimizing smooth energy guidance in diffusion models, reducing computational overhead while improving image quality.

URL: https://openreview.net/forum?id=RZ1QcOaTpk

---

Title: Bridging the Gap between Supervised and Self-supervised Learning

Abstract: Self-supervised representation learning has mainly advanced in an empirical rather than theoretical manner. Many successful algorithms combine multiple techniques that are supported by experiments. This approach makes it difficult for the community to understand self-supervised learning fundamentally. To help settle this situation, we take a principled approach. We theoretically formulate a self-supervised learning problem as an approximation of a supervised learning problem. From the formulated problem, we derive a loss that is closely related to existing contrastive losses, thereby providing a foundation for these losses. The concepts of prototype representation bias and balanced contrastive loss are naturally introduced in the derivation, which provide insights to help understand self-supervised learning. We discuss how components of our framework align with practices of self-supervised learning algorithms, focusing on SimCLR. We also investigate the impact of balancing the attracting force between positive pairs and the repelling force between negative pairs. The proofs of our theorems are provided in the appendix, and the code to reproduce experimental results is provided in the supplementary material.

URL: https://openreview.net/forum?id=OrzGlOmNJa

---

Title: LEGO-Learn: Label-Efficient Graph Open-Set Learning

Abstract: How can we train graph-based models to recognize unseen classes while keeping labeling costs low? Graph open-set learning (GOL) and out-of-distribution (OOD) detection aim to address this challenge by training models that can accurately classify known, in-distribution (ID) classes while identifying and handling previously unseen classes during inference. It is critical for high-stakes, real-world applications where models frequently encounter unexpected data, including finance, security, and healthcare. However, current GOL methods assume access to a large number of labeled ID samples, which is unrealistic for large-scale graphs due to high annotation costs.
In this paper, we propose LEGO-Learn (Label-Efficient Graph Open-set Learning), a novel framework that addresses open-set node classification on graphs within a given label budget by selecting the most informative ID nodes. LEGO-Learn employs a GNN-based filter to identify and exclude potential OOD nodes and then selects highly informative ID nodes for labeling using the K-Medoids algorithm. To prevent the filter from discarding valuable ID examples, we introduce a classifier that differentiates between the $C$ known ID classes and an additional class representing OOD nodes (hence, a $C+1$ classifier). This classifier utilizes a weighted cross-entropy loss to balance the removal of OOD nodes while retaining informative ID nodes. Experimental results on four real-world datasets demonstrate that LEGO-Learn significantly outperforms leading methods, achieving up to a $6.62\%$ improvement in ID classification accuracy and a $7.49\%$ increase in AUROC for OOD detection.

URL: https://openreview.net/forum?id=J6oxTJPOyN

---

Title: Simple Calibration via Geodesic Kernels

Abstract: Deep discriminative approaches, such as decision forests and deep neural networks, have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring calibration for both in-distribution and out-of-distribution regions. Many popular methods for in-distribution (ID) calibration, such as isotonic and Platt’s sigmoidal regression, exhibit adequate ID calibration performance. However, these methods are not calibrated for the entire feature space, leading to overconfidence in the out-of-distribution (OOD) region. Existing OOD calibration methods generally exhibit poor ID calibration. In this paper, we jointly address the ID and OOD problems. We leveraged the fact that deep models learn to partition feature space into a union of polytopes, that is, flat-sided geometric objects. We introduce a geodesic distance to measure the distance between these polytopes and further distinguish samples within the same polytope using a Gaussian kernel. Our experiments on both tabular and vision benchmarks show that the proposed approaches, namely Kernel Density Forest (KDF) and Kernel Density Network (KDN), obtain well-calibrated posteriors for both ID and OOD samples, while mostly preserving the classification accuracy and extrapolating beyond the training data to handle OOD inputs appropriately.

URL: https://openreview.net/forum?id=dpcRp8ix5T

---

Title: Towards Efficient Contrastive PAC Learning

Abstract: We study contrastive learning under the PAC learning framework. While a series of recent works have shown statistical results for learning under contrastive loss, based either on the VC-dimension or Rademacher complexity, their algorithms are inherently inefficient or not implying PAC guarantees. In this paper, we consider contrastive learning of the fundamental concept of linear representations. Surprisingly, even under such basic setting, the existence of efficient PAC learners is largely open. We first show that the problem of contrastive PAC learning of linear representations is intractable to solve in general. We then show that it can be relaxed to a semi-definite program when the distance between contrastive samples is measured by the $\ell_2$-norm. We then establish generalization guarantees based on Rademacher complexity, and connect it to PAC guarantees under certain contrastive large-margin conditions. To the best of our knowledge, this is the first efficient PAC learning algorithm for contrastive learning.

URL: https://openreview.net/forum?id=dBJo9hyKVg

---

Title: On Memory and Generalization in the Era of Linear Recurrence

Abstract: Memory is crucial for the ability to store and retrieve prior knowledge when that information is gathered as a continuous stream that cannot be processed all at once. For decades, various types of artificial recurrent neural networks (RNNs) have been designed and improved to handle sequential data, incorporating memory in different ways. Transformers have become the most widely adopted architecture to deal with sequential data, while more recently structured state-space models (SSMs) and linear RNNs were put forward for their improved computational efficiency. While these families of models have been studied on various synthetic and real-world tasks, the generalization abilities of these newer models remain a topic of ongoing exploration. In particular, there is a gap in the current literature regarding the length generalization of models on sequence modeling tasks, both across models and across tasks. For models, while numerous studies have investigated the generalization of RNNs and Transformers to longer sequences, there is not much work devoted to such studies for SSMs or linear RNNs. Regarding tasks,
one limitation of current works is their focus on formal language tasks for studying the generalization of sequence modeling. In contrast, the deep learning literature often introduces a variety of other tasks to assess the specific capabilities of deep learning models on sequential data. In this paper, we take a step toward addressing this gap by comparing the generalization abilities of all three families of algorithms across tasks that impose different memory requirements and are of special interest to the deep learning community, namely, copying tasks, state tracking tasks, and counting tasks. Our results show that despite their great efficiency, state space models seem to be less able than the non-linear recurrent models to generalize to longer sequences.

URL: https://openreview.net/forum?id=tomk8KbsJH

---

Title: [RE] GNNBoundary: Towards Explaining Graph Neural Networks through the Lens of Decision Boundaries

Abstract: Graph Neural Networks (GNNs) can model complex relationships while posing significant interpretability challenges due to the unique and varying properties of graph structures, which hinder the adaptation of existing methods from other domains. To address interpretability challenges in GNNs, GNNBoundary was designed as a model-level explainability tool to provide insights into their overall behavior. This paper aims to thoroughly evaluate the reproducibility, robustness, and practical applicability of the findings presented in the original work by replicating and extending their experiments, highlighting both strengths and limitations while considering potential future improvements. Our results show that while the algorithm can reliably generate near-boundary graphs in certain settings, its performance is highly sensitive to hyperparameter choices and suffers from convergence issues. Furthermore, we find that the generated solutions lack diversity, often representing only a single region on the decision boundary, which limits their effectiveness in broader decision boundary analysis. All the code used throughout the research is publicly available on GitHub.

URL: https://openreview.net/forum?id=zLfLTHOdZW

---

Title: Remembering to Be Fair Again: Reproducing Non-Markovian Fairness in Sequential Decision Making

Abstract: Ensuring long-term fairness in sequential decision-making is a key challenge in machine learning. Alamdari et al. (2024) introduced FairQCM, a reinforcement learning algorithm that enforces fairness in non-Markovian settings via memory augmentations and counterfactual reasoning. We reproduce and extend their findings by validating their claims and introducing novel enhancements. We confirm that FairQCM outperforms standard baselines in fairness enforcement and sample efficiency across different environments. However, alternative fairness metrics (Egalitarian, Gini) yield mixed results, and counterfactual memories show limited impact on fairness improvement. Further, we introduce a realistic COVID-19 vaccine allocation environment based on SEIR, a popular compartmental model of epidemiology. To accommodate continuous action spaces, we develop FairSCM, which integrates counterfactual memories into a Soft Actor-Critic framework. Our results reinforce that counterfactual memories provide little fairness benefit and, in fact, hurt performance, especially in complex, dynamic settings. The original code, modified to be 70% more efficient, and our extensions will be available on GitHub.

URL: https://openreview.net/forum?id=H6DtMcZf5s

---

Title: RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

Abstract: Although adversarial robustness has been extensively studied in white-box settings, recent advances in black-box attacks (including transfer- and query-based approaches) are primarily benchmarked against weak defenses, leaving a significant gap in the evaluation of their effectiveness against more recent and moderate robust models (e.g., those featured in the Robustbench leaderboard). In this paper, we question this lack of attention from black-box attacks to robust models. We establish a framework to evaluate the effectiveness of recent black-box attacks against both top-performing and standard defense mechanisms, on the ImageNet dataset. Our empirical evaluation reveals the following key findings: (1) the most advanced black-box attacks struggle to succeed even against simple adversarially trained models; (2) robust models that are optimized to withstand strong white-box attacks, such as AutoAttack, also exhibits enhanced resilience against black-box attacks; and (3) robustness alignment between the surrogate models and the target model plays a key factor in the success rate of transfer-based attacks.

URL: https://openreview.net/forum?id=1Ebwbr4Anh

---

Title: Beyond TF-IDF: Reproducibility and Generalizability of Two-sided Fairness Tradeoffs

Abstract: In this paper, we reproduce the experiments conducted by Greenwood et al. (2024) to validate their findings. We successfully recreate their main empirical results and conclude that the authors' two primary claims are valid: (a) item fairness tends to impose a higher cost on user fairness in more homogeneous populations, and (b) the cost of misestimation is high and item fairness constraints do not affect this cost. We extend their experiments by using SPECTER embeddings and the Amazon Book dataset to validate the generalizability of their results. We find similar results, for both the original dataset and the Amazon Book dataset, consistent with those of the authors. However, the results obtained with SPECTER embeddings do not fully support the two claims.

URL: https://openreview.net/forum?id=i7xcwZFTqx

---

Title: Reproducibility Study of "FairCLIP: Harnessing Fairness in Vision-Language Learning"

Abstract: Fairness is a crucial consideration in medical deep learning, as model bias can lead to disparities in diagnoses and treatment decisions. Luo et al. (2024a) conducted a comprehensive fairness analysis of two vision-language models, CLIP and BLIP2, revealing significant bias in their predictions. The authors introduced FairCLIP, a model that mitigates bias and achieves a better performance-fairness trade-off. In this work, we aim to (1) reproduce the key findings of Luo et al. (2024a) and (2) extend their analysis with additional evaluations. Our results confirm that most of the reported findings are reproducible, although we identify discrepancies in specific cases. Furthermore, we conduct a more extensive fairness analysis by incorporating two additional metrics: Precision Disparity and Mean Absolute Deviation. Following this analysis, we confirm the presence of bias in CLIP. However, despite being able to reproduce most of the results, we challenge the claim that FairCLIP improves fairness. Our results suggest that improvements of FairCLIP over CLIP are inconsistent and architecture- or attribute-dependent, rather than a generalizable improvement in fairness. Finally, we conduct a study to identify the source of bias. Our results indicate that the bias does not originate from the summarized clinical notes, medical pre-training, group imbalance, or attribute identification.

URL: https://openreview.net/forum?id=u22nSdHsFt

---

Title: Discover-then-Name Revisited: Enhancing Concept Bottle- Neck Models Interpretability

Abstract: This study aims to reproduce and extend the research on Discover-Then-Name Concept
Bottleneck Models (DN-CBM) introduced by Rao et al. (2024). DN-CBM enhances tra-
ditional CBM models by incorporating sparse autoencoders (SAEs) to enable automatic
concept discovery and improved concept generation and interpretability. We replicate the
key experiments on CIFAR-10, CIFAR-100, Places365, and ImageNet, confirming the claims
of automated concept discovery, task-agnostic applicability, and improved vocabulary lead-
ing to greater granularity. However, we find that the claim of superior interpretability over
CLIP is inconclusive. Beyond replication, we introduce new experiments, including an anal-
ysis of color perturbations on concept robustness and the integration of Local Interpretable
Model-Agnostic Explanations (LIME) to trace which features correspond to each concept.
Our findings reveal the model’s limited robustness to color variations and demonstrate how
adding LIME results in increased interpretability and the ability to detect (spurious) corre-
lations. The complete implementation of the original authors experiments as well as ours is
available in our repository: https://github.com/EKarasevnl/Reproducibility-DN-CBM.

URL: https://openreview.net/forum?id=gYDtFr97X7

---

Title: Test-Time Fairness and Robustness in Large Language Models

Abstract: Frontier Large Language Models (LLMs) can be socially discriminatory or sensitive to spurious features of their inputs. Because only well-resourced corporations can train frontier LLMs, we need robust test-time strategies to control such biases. Existing solutions, which instruct the LLM to be fair or robust, rely on the model’s implicit understanding of bias. Causality provides a rich formalism through which we can be explicit about our debiasing requirements. Yet, as we show, a naive application of the standard causal debiasing strategy, counterfactual data augmentation, fails to fulfill individual-level debiasing requirements at test time. To address this, we develop stratified invariance, a flexible debiasing notion that can capture a range of debiasing requirements, from population level to individual level, through an additional measurement that stratifies the predictions. We developed a complete test for this new approach and introduced a data augmentation strategy that guarantees stratified invariance at test time under suitable assumptions, together with a prompting strategy that encourages stratified invariance in LLMs. We show that our prompting strategy, unlike implicit instructions, consistently reduces the bias of frontier LLMs across a suite of synthetic and real-world benchmarks without requiring additional data, finetuning or pre-training.

URL: https://openreview.net/forum?id=1fML4VF5FG

---

Title: RESTOR: Knowledge Recovery via Machine Unlearning

Abstract: Large language models trained on web-scale corpora can memorize undesirable datapoints containing incorrect facts, copyrighted content, or sensitive data. Recently, many machine unlearning algorithms have been proposed that aim to `erase' the effect of these datapoints from trained models -- that is, revert model behavior to be \emph{similar to a model that had never been trained on these datapoints in the first place}. However, evaluating the success of unlearning algorithms remains an open challenge. While previous work has relied on heuristics—such as verifying that the model can no longer reproduce the specific information targeted for removal while maintaining accuracy on unrelated test data— these approaches fall short of capturing the full effect of erasing the effect of datapoints. In this work, we propose the RESTOR framework for machine unlearning, which evaluates the ability of unlearning algorithms to perform targeted data erasure from models, by evaluating the ability of models to forget the knowledge introduced in these datapoints, while simultaneously recovering the model's knowledge state had it never encountered these datapoints. RESTOR helps uncover several novel insights about popular unlearning algorithms, and the mechanisms through which they operate--- for instance, identifying that some algorithms merely emphasize forgetting but not recovering knowledge, and that localizing unlearning targets can enhance unlearning performance.

URL: https://openreview.net/forum?id=BbwlJpNXgW

---

Title: Online Decision Deferral under Budget Constraints

Abstract: Machine Learning (ML) models are increasingly used to support or substitute decision making. In applications where skilled experts are a limited resource, it is crucial to reduce their burden and automate decisions when the performance of an ML model is at least of equal quality.
However, models are often pre-trained and fixed, while tasks arrive sequentially and their distribution may shift. In that case, the respective performance of the decision makers may change, and the deferral algorithm must remain adaptive. We propose a contextual bandit model of this online decision making problem. Our framework includes budget constraints and different types of partial feedback models. Beyond the theoretical guarantees of our algorithm, we propose efficient extensions that achieve remarkable performance on real-world datasets.

URL: https://openreview.net/forum?id=xPeqRbcmFj

---

Title: Accelerated Training on Low-Power Edge Devices

Abstract: Training on edge devices poses several challenges as these devices are generally resource-constrained, especially in terms of power.
State-of-the-art techniques at the device level reduce the GPU frequency to enforce power constraints, leading to a significant increase in training time. To accelerate training, we propose to jointly adjust the system and application parameters (in our case, the GPU frequency and the batch size of the training task) while adhering to the power constraints on devices. We introduce a novel cross-layer methodology that combines predictions of batch size efficiency and device profiling to achieve the desired optimization. Our evaluation on real hardware shows that our method outperforms the current baselines that depend on state of the art techniques, reducing the training time by $2.4\times$ with results very close to optimal. Our measurements also indicate a substantial reduction in the overall energy used for the training process. These gains are achieved without reduction in the performance of the trained model.

URL: https://openreview.net/forum?id=usT9zmom4T

---

Title: Test-time Contrastive Concepts for Open-world Semantic Segmentation with Vision-Language Models

Abstract: Recent CLIP-like Vision-Language Models (VLMs), pre-trained on large amounts of image-text pairs to align both modalities with a simple contrastive objective, have paved the way to open-vocabulary semantic segmentation. Given an arbitrary set of textual queries, image pixels are assigned the closest query in feature space. However, this works well when a user exhaustively lists all possible visual concepts in an image that contrast against each other for the assignment. This corresponds to the current evaluation setup in the literature, which relies on having access to a list of in-domain relevant concepts, typically classes of a benchmark dataset. Here, we consider the more challenging (and realistic) scenario of segmenting a single concept, given a textual prompt and nothing else. To achieve good results, besides contrasting with the generic “background” text, we propose two different approaches to automatically generate, at test time, query-specific textual contrastive concepts. We do so by leveraging the distribution of texts in the VLM’s training set or crafted LLM prompts. We also propose a metric designed to evaluate this scenario and show the relevance of our approach on commonly used datasets.

URL: https://openreview.net/forum?id=wyOv4kGkbU

---

Title: ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

Abstract: Parameter-efficient fine-tuning (PEFT) enables creation of specialized language models for diverse tasks, resulting in numerous expert modules. In many practical use cases, these expert PEFT modules are integrated into a single model that answers arbitrary queries by routing queries to different experts. However, only a few experts can be kept in GPU memory due to memory constraints. Consequently, expert modules are frequently loaded and offloaded between CPU/GPU memory or disk storage. This frequent swapping dramatically increases communication overhead, leading unacceptable latency and degrading user experience. The large size of modern PEFT modules further exacerbates this latency. For example, QLoRA experts for 65B LLaMA are 3.2GB, making swapping a major communication bottleneck, particularly in memory-constrained environments. To address these issues, we present ComPEFT (compressed PEFT), a novel method for compressing fine-tuning residuals (task vectors) of PEFT models. Reducing expert PEFT module size effectively addresses both memory and communication limitations, facilitating faster swapping and enabling a higher density of experts within a given memory footprint. ComPEFT employs sparsification and ternary quantization to reduce PEFT module size without any additional training while preserving or enhancing model performance. Extensive evaluation across T5, T0, and LLaMA-based models with 200M − 65B parameters, ComPEFT achieves compression ratios of 8x − 50x. Specifically, we show that ComPEFT improves with scale – stronger models exhibit higher compressibility and better performance. We show ComPEFT applied to LLaMA − 65B outperforms QLoRA by 4.16% on MMLU with a 26x storage size reduction. Additionally, compressed experts produced by ComPEFT maintain few-shot compositional generalization capabilities, facilitate efficient communication and computation, and exhibit enhanced performance when merged. Lastly, we provide an analysis of different method components, compare ComPEFT with other PEFT methods, and test its efficacy for compressing full finetuning residual.

URL: https://openreview.net/forum?id=CovLQwu611

---

Title: Reproducibility study of "Learning Decision Trees and Forests with Algorithmic Recourse"

Abstract: Decision trees and random forests are widely recognized machine learning models, particu-
larly for their interpretability. However, ensuring algorithmic recourse—providing individ-
uals with actionable steps to alter model predictions—remains a significant challenge. The
authors of the paper Learning Decision Trees and Forests with Algorithmic Re-
course (Kanamori et al. (2024)) introduce a novel method for training tree-based models
while guaranteeing the existence of recourse actions. In this study, we attempt to repli-
cate the original findings and validate their data using the open-source implementation and
datasets provided in the original paper. While we observe some differences in the per-
formance of sensitivity forests, we confirm that our results closely align with those of the
decision trees presented in the original study.

URL: https://openreview.net/forum?id=Pj1GrbLLu0

---

Title: A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior

Abstract: Image watermarks have been considered a promising technique to help detect AI-generated content, which can be used to protect copyright or prevent fake image abuse. In this work, we present a black-box method for removing invisible image watermarks, without the need of any dataset of watermarked images or any knowledge about the watermark system. Our approach is simple to implement: given a single watermarked image, we regress it by deep image prior (DIP). We show that from the intermediate steps of DIP one can reliably find an evasion image that can remove invisible watermarks while preserving high image quality. Due to its unique working mechanism and practical effectiveness, we advocate including DIP as a baseline invasion method for benchmarking the robustness of watermarking systems. Finally, by showing the limited ability of DIP and other existing black-box methods in evading training-based visible watermarks, we discuss the positive implications on the practical use of training-based visible watermarks to prevent misinformation abuse.

URL: https://openreview.net/forum?id=g85Vxlrq0O

---

Title: Emergent Symbol-like Number Variables in Artificial Neural Networks

Abstract: There is an open question of what types of numeric representations can emerge in neural systems. To what degree do neural networks induce abstract, mutable, slot-like numeric variables, and in what situations do these representations emerge? How do these representations change over the course of learning, and how can we understand the neural implementations in ways that are unified across different models' implementations? In this work, we approach these questions by first training sequence based neural systems using Next Token Prediction (NTP) objectives on numeric tasks. We then seek to understand the neural solutions through the lens of causal abstractions or symbolic algorithms. We use a combination of causal interventions and visualization methods to find that artificial neural models do indeed develop analogs of interchangeable, mutable, latent number variables purely from the NTP objective. We then ask how variations on the tasks and model architectures affect the models' learned solutions to find that these symbol-like numeric representations do not form for every variant of the task, and transformers solve the problem in a notably different way than their recurrent counterparts. We then show how the symbol-like variables change over the course of training to find a strong correlation between the models' task performance and the alignment of their symbol-like representations. Lastly, we show that in all cases, some degree of gradience exists in these neural symbols, highlighting the difficulty of finding simple, interpretable symbolic stories of how neural networks perform numeric tasks. Taken together, our results are consistent with the view that neural networks can approximate interpretable symbolic programs of number cognition, but the particular program they approximate and the extent to which they approximate it can vary widely, depending on the network architecture, training data, extent of training, and network size.

URL: https://openreview.net/forum?id=YPnYpiru5W

---

Title: Community Correlations and Testing Independence Between Binary Graphs

Abstract: Graph data has a unique structure that deviates from standard data assumptions, often necessitating modifications to existing methods or the development of new ones to ensure valid statistical analysis. In this paper, we explore the notion of correlation and dependence between two binary graphs. Given vertex communities, we propose community correlations to measure the edge association, which equals zero if and only if the two graphs are conditionally independent within a specific pair of communities. The set of community correlations naturally leads to the maximum community correlation, indicating conditional independence on all possible pairs of communities, and to the overall graph correlation, which equals zero if and only if the two binary graphs are unconditionally independent. We then compute the sample community correlations via graph encoder embedding, proving they converge to their respective population versions, and derive the asymptotic null distribution to enable a fast, valid, and consistent test for conditional or unconditional independence between two binary graphs. The theoretical results are validated through comprehensive simulations, and we provide two real-data examples: one using Enron email networks and another using mouse connectome graphs, to demonstrate the utility of the proposed correlation measures.

URL: https://openreview.net/forum?id=K5iftV8jNi

---

Title: Cometh: A continuous-time discrete-state graph diffusion model

Abstract: Discrete-state denoising diffusion models led to state-of-the-art performance in graph generation, especially in the molecular domain. Recently, they have been transposed to continuous time, allowing more flexibility in the reverse process and a better trade-off between sampling efficiency and quality. Here, to leverage the benefits of both approaches, we propose Cometh, a continuous-time discrete-state graph diffusion model, tailored to the specificities of graph data. In addition, we also successfully replaced the set of structural encodings previously used in the discrete graph diffusion model with a single random-walk-based encoding, providing a simple and principled way to boost the model's expressive power. Empirically, we show that integrating continuous time leads to significant improvements across various metrics over state-of-the-art discrete-state diffusion models on a large set of molecular and non-molecular benchmark datasets. In terms of VUN samples, Cometh obtains a near-perfect performance of 99.5% on the planar graph dataset and outperforms DiGress by 12.6% on the large GuacaMol dataset.

URL: https://openreview.net/forum?id=nuN1mRrrjX

---

Title: Solving the Cold Start Problem on One's Own as an End User via Preference Transfer

Abstract: We propose a new approach that enables end users to directly solve the cold start problem by themselves. The cold start problem is a common issue in recommender systems, and many methods have been proposed to address the problem on the service provider's side. However, when the service provider does not take action, users are left with poor recommendations and no means to improve their experience. We propose an algorithm, Pretender, that allows end users to proactively solve the cold start problem on their own. Pretender does not require any special support from the service provider and can be deployed independently by users. We formulate the problem as minimizing the distance between the source and target distributions and optimize item selection from the target service accordingly. Furthermore, we establish theoretical guarantees for Pretender based on a discrete quadrature problem. We conduct experiments on real-world datasets to demonstrate the effectiveness of Pretender.

URL: https://openreview.net/forum?id=Sgj0ZdoVWH

---

Title: ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control

Abstract: We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, \emph{${\epsilon}{t}$-greedy}, which generates exploratory options for exploring less-visited states. We prove that search using $\epsilon t$-greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, \emph{GDRB}, and implement \emph{longest n-step returns}. The resulting algorithm, \emph{ETGL-DDPG}, integrates all three techniques: \bm{$\epsilon t$}-greedy, \textbf{G}DRB, and \textbf{L}ongest $n$-step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the performance of DDPG in this setting.

URL: https://openreview.net/forum?id=6g1WJ55N51

---

Title: Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

Abstract: We propose Mirror Descent Optimal Transport (MDOT), a novel method for solving discrete optimal transport (OT) problems with high precision, by unifying temperature annealing in entropic-regularized OT (EOT) with mirror descent techniques. In this framework, temperature annealing produces a sequence of EOT dual problems, whose solution gradually gets closer to the solution of the original OT problem. We solve each problem efficiently using a GPU-parallel nonlinear conjugate gradients algorithm (PNCG) that outperforms traditional Sinkhorn iterations under weak regularization. Moreover, our investigation also reveals that the theoretical convergence rate of Sinkhorn iterations can exceed existing non-asymptotic bounds when its stopping criterion is tuned in a manner analogous to MDOT.

Our comprehensive ablation studies of MDOT-PNCG affirm its robustness across a wide range of algorithmic parameters. Benchmarking on 24 problem sets of size $n=4096$ in a GPU environment demonstrate that our method attains high-precision, feasible solutions significantly faster than a representative set of existing OT solvers—including accelerated gradient methods and advanced Sinkhorn variants—in both wall-clock time and number of operations. Empirical convergence rates range between $O(n^2 \varepsilon^{-1/4})$ and $O(n^2 \varepsilon^{-1})$, where $\varepsilon$ is the optimality gap. For problem sizes up to $n=16\,384$, the empirical runtime scales as $\widetilde{O}(n^2)$ for moderate precision and as $\widetilde{O}(n^{5/2})$ at worst for high precision. These findings establish MDOT-PNCG as a compelling alternative to current OT solvers, particularly in challenging weak-regularization regimes.

URL: https://openreview.net/forum?id=FVFqrxeF8e

---

Title: Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information

Abstract: Policy design in non-stationary Markov Decision Processes (MDPs) is inherently challenging due to the complexities introduced by time-varying system transition and reward, which make it difficult for learners to determine the optimal actions for maximizing cumulative future rewards. Fortunately, in many practical applications, such as energy systems, look-ahead predictions are available, including forecasts for renewable energy generation and demand. In this paper, we leverage these look-ahead predictions and propose an algorithm designed to achieve low regret in non-stationary MDPs by incorporating such predictions. Our theoretical analysis demonstrates that, under certain assumptions, the regret decreases exponentially as the look-ahead window expands. When the system prediction is subject to error, the regret does not explode even if the prediction error grows sub-exponentially as a function of the prediction horizon. We validate our approach through simulations and confirm its efficacy in non-stationary environments.

URL: https://openreview.net/forum?id=uObs1YwXjQ

---

Title: Finetuning CLIP to Reason about Pairwise Differences

Abstract: Vision-language models (VLMs) such as CLIP are trained via contrastive learning between text and image pairs, resulting in aligned image and text embeddings that are useful for many downstream tasks. A notable drawback of CLIP, however, is that the resulting embedding space seems to lack some of the structure of their purely text-based alternatives. For instance, while text embeddings have been long noted to satisfy \emph{analogies} in embedding space using vector arithmetic, CLIP has no such property. In this paper, we propose an approach to natively train CLIP in a contrastive manner to reason about differences in embedding space.
We finetune CLIP so that the differences in image embedding space correspond to \emph{text descriptions of the image differences}, which we synthetically generate with large language models on image-caption paired datasets. We first demonstrate that our approach yields significantly improved capabilities in ranking images by a certain attribute (e.g., elephants are larger than cats), which is useful in retrieval or constructing attribute-based classifiers, and improved zeroshot classification performance on many downstream image classification tasks. In addition, our approach enables a new mechanism for inference that we refer to as comparative prompting, where we leverage prior knowledge of text descriptions of differences between classes of interest, achieving even larger performance gains in classification. Finally, we illustrate that the resulting embeddings obey a larger degree of geometric properties in embedding space, such as in text-to-image generation.

URL: https://openreview.net/forum?id=USNJFZTWPn

---

Title: Few-shot Fine-grained Image Classification with Interpretable Prompt Learning through Distribution Alignment

Abstract: Explainable few-shot fine-grained image classification is an essential task to align AI with human preferences by enabling precise recognition of subtle differences and providing explanations for decisions. Existing supervised models often struggle in few-shot scenarios due to their reliance on extensive labeled data, which is intractable to collect for customized human preferences. Meanwhile, large vision-language models (VLMs) while robust in zero-shot tasks, fail to capture the subtle difference required for fine-grained classification. In this work, we introduce a novel approach that enhances AI alignment in both zero-shot and few-shot fine-grained image classification by leveraging explainable prompt learning and distribution alignment techniques. Specifically, we utilize pre-trained LLM to expand the label space in a training-free manner, addressing the disparity between plain text and the image-text corpus distributions. This is further enhanced by a few-shot learning pipeline that incorporates prompt learning with a weighted distribution alignment mechanism between image and text representations for better alignment with human-like understanding. The proposed approach not only addresses the limitations of current prompting techniques but also enhances interpretability. Extensive experiments demonstrate the effectiveness of our method and illustrate the interpretability of our descriptions.

URL: https://openreview.net/forum?id=WimnbeGlu0

---

Title: Reconciling Privacy and Explainability in High-Stakes: A Systematic Inquiry

Abstract: The integration of deep learning into diverse high-stakes scientific applications demands a careful balance between Privacy and Explainability. This work explores the interplay between two essential requirements: Right-to-Privacy (RTP), enforced through differential privacy (DP)—the gold standard for privacy-preserving machine learning due to its rigorous guarantees—and Right-to-Explanation (RTE), facilitated by post-hoc explainers, the go-to tools for model auditing. We systematically assess how DP influences the applicability of widely used explanation methods, uncovering fundamental intricacies between privacy-preserving models and explainability objectives. Furthermore, our work throws light on how RTP and RTE can be reconciled in high-stakes. Our study, with the example of a wildly used use-case, concludes by outlining
a novel software pipeline that upholds RTP and RTE requirements.

URL: https://openreview.net/forum?id=DQqdjPcE6g

---

Title: Dynamics of the accelerated t-SNE

Abstract: This paper investigates the dynamics of t-Stochastic Neighbor Embedding (t-SNE), a popular tool for visualizing complex datasets in exploratory data analysis, optimized by the Nesterov’s accelerated gradient method. Building on the foundational work that connects t-SNE with spectral clustering and dynamical systems, we extend the analysis to include accelerated dynamics which is not addressed in the previous work, revealing the emergence of Bessel and modified Bessel functions as a novel aspect of the algorithm’s behavior characterizing the temporal evolution of the accelerated t-SNE. Because the ordinary differential equation corresponding to the optimization process under consideration has a closed-form solution, by performing eigenvalue decomposition of the data’s adjacency matrix as a pre-processing step, we can obtain low-dimensional embeddings at any point in time without performing sequential optimization. This advancement not only enhances the practical utility of t-SNE but also contributes to a deeper understanding of its underlying dynamics.

URL: https://openreview.net/forum?id=dfUebM9asV

---

Title: Gaussian Processes with Bayesian Inference of Covariate Couplings

Abstract: Gaussian processes are powerful probabilistic models that are often coupled with Automatic Relevance Determination (ARD) capable of uncovering the importance of individual covariates. We develop covariances characterized by affine transformations of the inputs, formalized via a precision matrix between covariates, which can uncover covariate couplings for enhanced interpretability. We study a range of couplings priors from Wishart to Horseshoe and present fully Bayesian inference of such precision matrices within sparse Gaussian process. We demonstrate empirically the efficacy and interpretability of this approach.

URL: https://openreview.net/forum?id=fameEAljo3

---

Title: SuFP: Piecewise Bit Allocation Floating-Point for Robust Neural Network Quantization

Abstract: The rapid growth in model size and computational demand of Deep Neural Networks (DNNs) has led to significant challenges in memory and computational efficiency, necessitating the adoption of lower bit-width data types to enhance hardware performance. Floating-point 8 (FP8) has emerged as a promising solution, supported by the latest AI processors, due to its potential for reducing memory usage and computational load. However, each application often requires a different optimal FP8 configuration to achieve high performance, resulting in inconsistent performance and increased hardware complexity.
To address these limitations, we introduce Super Floating-Point (SuFP), an innovative data type that integrates various floating-point configurations into a single representation through a piecewise bit allocation. This approach enables SuFP to effectively capture both dense regions near zero and sparse regions with outliers, thereby minimizing quantization errors and ensuring full precision floating-point performance across different models. Furthermore, SuFP's processing element design is optimized to reduce the hardware overhead.
Our experimental results demonstrate the robustness and accuracy of SuFP over various neural networks in the vision and natural language processing domain. Remarkably, SuFP shows its superiority in large models such as the large language model (Llama 2) and the text-to-image generative model (Stable Diffusion v2). We also verifies the feasibility of training with SuFP, confirming its broad applicability.

URL: https://openreview.net/forum?id=7M1adi1nfX

---

Title: Link Prediction with Relational Hypergraphs

Abstract: Link prediction with knowledge graphs has been thoroughly studied in graph machine learning, leading to a rich landscape of graph neural network architectures with successful applications. Nonetheless, it remains challenging to transfer the success of these architectures to inductive link prediction with relational hypergraphs, where the task is over $k$-ary relations, substantially harder than link prediction on knowledge graphs with binary relations only. In this paper, we propose a framework for link prediction with relational hypergraphs, empowering applications of graph neural networks on fully relational structures. Theoretically, we conduct a thorough analysis of the expressive power of the resulting model architectures via corresponding relational Weisfeiler-Leman algorithms and also via logical expressiveness. Empirically, we validate the power of the proposed model architectures on various relational hypergraph benchmarks. The resulting model architectures substantially outperform every baseline for inductive link prediction and also lead to competitive results for transductive link prediction.

URL: https://openreview.net/forum?id=S6fe4aH6YA

---

Title: SR-Reward: Taking The Path More Traveled

Abstract: In this paper, we propose a novel method for learning reward functions directly from offline demonstrations.
Unlike traditional inverse reinforcement learning (IRL), our approach decouples the reward function from the learner's policy, eliminating the adversarial interaction typically required between the two.
This results in a more stable and efficient training process.
Our reward module, \textit{SR-Reward}, leverages successor representation (SR) to encode a state based on expected future states' visitation under the demonstration policy and transition dynamics.
By utilizing the Bellman equation, SR-Reward can be learned concurrently with most reinforcement learning (RL) algorithms without altering the existing training pipeline.
We also introduce a negative sampling strategy to mitigate overestimation errors by reducing rewards for out-of-distribution data, thereby enhancing robustness.
This strategy introduces an inherent conservative bias into RL algorithms that employ the learned reward, encouraging them to stay close to the demonstrations where the consequences of the actions are better understood.
We evaluate our method on D4RL as well as Maniskill Robot Manipulation environments, achieving competitive results compared to offline RL algorithms with access to true rewards and imitation learning (IL) techniques like behavioral cloning.

URL: https://openreview.net/forum?id=bzk1sV1svm

---

Title: Thoughts and Lessons on Using Visual Foundation Models for Manipulation

Abstract: Training vision-based robotic systems from scratch is both computationally expensive and memory intensive. To mitigate these challenges, recent approaches forgo end-to-end training in favor of adopting visual representations from visual foundation models -- large scale models designed for broad task transferability. Recent years have seen numerous vision foundation models emerge, including several designed specifically for manipulation tasks. However, we still lack clear principles for what makes these models effective for robotics applications. To address this gap, we systematically evaluate vision foundation models to understand what makes them effective for offline robotic learning. We find that across nine diverse vision encoders, a representation's ability to reconstruct edges and predict key points strongly correlates with its performance on manipulation tasks. Extensive correlation analysis across 21 manipulation tasks consistently shows that representations preserving edge and key point information achieve the highest environment success rates. These findings appear to challenge conventional wisdom about reconstruction-based pre-training and offer a new lens for understanding what makes vision representations effective for robotics.

URL: https://openreview.net/forum?id=o6mnkDzVuc

---

Title: OmniInput: An Evaluation Framework for Deep Learning Models on Internet-Scale Data

Abstract: Evaluating machine learning models is important yet challenging in many real-world scenarios. Traditional analysis is dataset-driven, that is, models are evaluated on predefined benchmark datasets. However, these datasets can only cover a limited scope, leaving unanticipated inputs untested and weaknesses of the model unrevealed. To overcome this problem, we propose OmniInput, a novel approach to evaluate models comprehensively using an input space (i.e. internet-scale data). Our method entails efficient sampling of the inputs from the model and estimation of its corresponding output distribution, and an innovative way to calculate the model’s precision and recall curve from the output distribution with only modest human annotation effort. In our experiments, we first validate the correctness of OmniInput within a small input space where brute-force enumeration is still possible. We then show that OmniInput can quantitatively evaluate more complex models such as language models (various versions of GPT2, OLMo, and DistilBERT) and computer vision models, and analyze interesting patterns in an input space.

URL: https://openreview.net/forum?id=SvOYlVa3VK

---

Reply all

Reply to author

Forward

0 new messages