Accepted papers
===============
Title: Mastering SAM Prompts: A Large-Scale Empirical Study in Segmentation Refinement for Scientific Imaging
Authors: Stephen Price, Elke Rundensteiner, Danielle L. Cote
Abstract: Segment Anything Model (SAM) has emerged as a prevalent tool empowering advances in vision tasks from instance segmentation, panoptic segmentation, to interactive segmentation. Leveraging powerful zero-shot capabilities enabled by visual prompts such as masks placed on the image, SAM has been shown to significantly improve tasks. Yet, a poor prompt can worsen SAM performance, risking consequences such as misdiagnoses, autonomous driving failures, or manufacturing defects. However, recent studies on visual SAM prompting remain limited, cover only a small fraction of potential prompt configurations, adopt ad-hoc evaluation strategies, and come with limited or even no rigorous analysis of the statistical significance of prompt configurations. To address this gap, we undertake the first large-scale empirical study comprehensively evaluating the impact of SAM prompt configurations on segmentation refinement. This includes 2,688 prompt configurations, including points, boxes, and masks with diverse augmentations, on four initial segmentation models for a total of 10,752 evaluations. From these results, we draw statistically significant insights along with practical guidelines for prompt design on scientific images. In particular, we recommend including a bounding box, which raised AP@50-95 by 0.320 and advise against using a coarse mask, which lowers AP@50-95 by -0.133 across all four models on microscopy data sets. We showcase that our recommended prompt configuration enables SAM to outperform leading refinement methods on multiple scientific benchmark datasets.
URL: https://openreview.net/forum?id=cWcTQMpqv6
---
Title: Uncertainty-aware Reward Design Process
Authors: Yang yang, Xiaolu Zhou, Bosong Ding, Miao Xin
Abstract: Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging process due to the inefficiencies and inconsistencies inherent in conventional reward engineering methodologies. Recent advances have explored leveraging large language models (LLMs) to automate the design of reward functions. However, LLMs’ insufficient numerical optimization capabilities often result in suboptimal reward hyperparam eter tuning, while non-selective validation of candidate reward functions leads to substantial computational overhead. To address these challenges, we propose the Uncertainty-aware Reward Design Process (URDP), a novel framework that integrates large language models to streamline reward function design and evaluation. URDP quantifies candidate reward function uncertainty based on the self-consistency analysis, enabling simulation-free identification of ineffective reward components while discovering novel ones. Furthermore, we introduce uncertainty-aware Bayesian optimization (UABO), which incorporates uncertainty estimation to improve the hyperparameter configuration. Finally, we construct a bi-level optimization framework by decoupling the reward component optimization and the hyperparameter tuning. URDP promotes the collaboration between the reward logic reasoning of the LLMs and the numerical optimization strengths of the Bayesian optimization. We conduct a comprehensive evaluation of URDP across 35 diverse tasks spanning three benchmark environments: Isaac Gym, Bidexterous Manipulation, and ManiSkill2. Our experimental results demonstrate that URDP not only generates higher-quality reward functions but also achieves significant improvements in the efficiency of automated reward design compared to existing approaches. We open-source all code at https://github.com/Yy12136/URDP.
URL: https://openreview.net/forum?id=CId5tW1HxR
---
New submissions
===============
Title: Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL
Abstract: We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion model. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-reweighting scheme in training. These key ingredients significantly improve algorithm robustness against environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in all multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better to shifted environments (in $28$ out of $30$ settings evaluated) thanks to its high expressiveness and diversity. Moreover, DOM2 is ultra data efficient and requires no more than $5\%$ data for achieving the same performance compared to existing algorithms (a $20\times$ improvement in data efficiency).
URL: https://openreview.net/forum?id=GKuCKSJKvl
---
Title: Coreset Selection via LLM-based Concept Bottlenecks
Abstract: Coreset Selection (CS) aims to identify a subset of the training dataset that achieves model performance comparable to using the entire dataset. Many state-of-the-art CS methods select coresets using scores whose computation requires training the downstream model on the entire dataset first and recording changes in the model's behavior on samples as it trains (training dynamics). These scores are inefficient to compute and hard to interpret, as they do not indicate whether a sample is difficult to learn in general or only for a specific downstream model. Our work addresses these challenges by proposing a score that computes a sample's difficulty using human-understandable textual attributes (concepts) independent of any downstream model. Specifically, we measure the alignment between a sample's visual features and concept bottlenecks, derived via large language models, by training a linear concept bottleneck layer and computing the sample's difficulty score using it. We then use stratified sampling based on this score to generate a coreset of the dataset. Crucially, our score is efficiently computable without training the downstream model on the full dataset even once, leads to high-performing coresets for various downstream models, and is computable even for an unlabeled dataset. Through experiments on five diverse datasets including ImageNet-1K, we show that our coresets outperform random subsets, even at high pruning rates, and lead to model performance comparable to or better than coresets found by training dynamics-based methods.
URL: https://openreview.net/forum?id=dGbBPXWFrL
---
Title: From Link Prediction to Forecasting: Addressing Challenges in Batch-based Temporal Graph Learning
Abstract: Dynamic link prediction is an important problem considered in many recent works that propose approaches for learning temporal edge patterns. To assess their efficacy, models are evaluated on continuous-time and discrete-time temporal graph datasets, typically using a traditional batch-oriented evaluation setup. However, as we show in this work, a batch-oriented evaluation is often unsuitable and can cause several issues. Grouping edges into fixed-sized batches regardless of their occurrence time leads to information loss or leakage, depending on the temporal granularity of the data. Furthermore, fixed-size batches create time windows with different durations, resulting in an inconsistent dynamic link prediction task. In this work, we empirically show how traditional batch-based evaluation leads to skewed model performance and hinders the fair comparison of methods. We mitigate this problem by reformulating dynamic link prediction as a link forecasting task that better accounts for temporal information present in the data.
URL: https://openreview.net/forum?id=iZPAykLE3l
---
Title: Improving Local Explainability By Learning Causal Graphs From Data
Abstract: Causal Shapley values take into account causal relations among dependent features to adjust the contributions of each feature to a prediction. A limitation of this approach is that it can only leverage known causal relations.
In this work we combine the computation of causal Shapley values with causal discovery, i.e. learning causal graphs from data. In particular, we compute causal explanations across a set of candidate causal graphs learned from observational data, yielding a set of Shapley values that reflects the space of possible explanations consistent with the data. We propose two methods for estimating this list efficiently, drawing on the equivalences of the interventional distributions for a subset of the causal graphs. We evaluate our methods on synthetic and real-world data, showing that they provide explanations that are often closer to the true causal impacts compared to traditional Shapley value approaches that disregard causal relationships. Even when the discovered graph or MEC is imperfect, we on average observe improvements over marginal and conditional Shapley values.
URL: https://openreview.net/forum?id=A1bXT7RQLU
---
Title: Learning High-Order Motion Patterns from Event Stream for Continuous Space-Time Video Super-Resolution
Abstract: Current methods in the domain of continuous space-time video super-resolution achieve temporal alignment by predicting motion between frames. However, these frame-based approaches encounter challenges with inaccurate optical flow estimation. To overcome this, we incorporate event data, enhancing both temporal and spatial aspects of video super-resolution. Based on the motion details conveyed by event streams, our proposed method, EvTaylor-Net, performs a Taylor expansion approximation of the object motion function at specified timestamps to estimate more precise forward optical flow. Our method estimates the masks from the event surface to alleviate the issue of multiple source pixels mapping to the same target position during the forward warping process. Furthermore, EvTaylor-Net adopts local implicit neural representation to simultaneously enhance the resolution of videos in both temporal and spatial domain, ensuring a comprehensive improvement of video quality. Extensive experimental results demonstrate that the proposed EvTaylor-Net, bolstered by event streams, outperforms state-of-the-art methods for spatio-temporal video super-resolution tasks.
URL: https://openreview.net/forum?id=yAi6lRT3Ai
---
Title: GENIE: Watermarking Graph Neural Networks for Link Prediction
Abstract: The rapid adoption, usefulness, and resource-intensive training of Graph Neural Network~(GNN) models have made them an invaluable intellectual property in graph-based machine learning. However, their wide-spread adoption also makes them susceptible to stealing, necessitating robust Ownership Demonstration~(OD) techniques. Watermarking is a promising OD framework for deep neural networks, but existing methods fail to generalize to GNNs due to the non-Euclidean nature of graph data. Existing works on GNN watermarking primarily focus on node and graph classification, overlooking Link Prediction (LP).
In this paper, we propose \genie~(watermarking \textbf{G}raph n\textbf{E}ural \textbf{N}etworks for l\textbf{I}nk pr\textbf{E}diction), the first scheme to watermark GNNs for LP. \genie creates a novel backdoor for both node-representation and subgraph-based LP methods, utilizing a unique trigger set and a secret watermark vector. Our OD scheme is equipped with Dynamic Watermark Thresholding~(DWT), ensuring high verification probability while addressing practical issues in existing OD schemes. We extensively evaluate \genie across 4~diverse model architectures~(\ie SEAL, GCN, GraphSAGE and NeoGNN), 7~real-world datasets and 21~watermark removal techniques and demonstrate its robustness to watermark removal and ownership piracy attacks. Finally, we discuss adaptive attacks against \genie and a defense strategy to counter it.
URL: https://openreview.net/forum?id=EmDuoySsbe
---
Title: Architecture-Aware Generalization Bounds for Temporal Networks: Theory and Fair Comparison Methodology
Abstract: Deep temporal architectures such as Temporal Convolutional Networks (TCNs) achieve strong predictive performance on sequential data, yet theoretical understanding of their generalization remains limited. We address this gap through three contributions: introducing a principled evaluation methodology for temporal models, revealing surprising empirical phenomena about temporal dependence, and establishing the first architecture-aware theoretical framework for dependent sequences.
\textbf{Fair-Comparison Methodology.} We introduce evaluation protocols that fix effective sample size $N_{\text{eff}}$ to isolate temporal structure effects from information content. This addresses a fundamental challenge: temporal dependence affects both information content and learning dynamics, and standard evaluations conflate these effects. Our methodology enables principled comparison of models across dependency regimes.
\textbf{Empirical Findings.} Applying this methodology reveals that under controlled $N_{\text{eff}} = 2{,}000$, strongly dependent sequences ($\rho = 0.8$) exhibit approximately $76\%$ smaller generalization gaps than weakly dependent ones ($\rho = 0.2$), challenging the conventional view that dependence universally impedes learning. However, observed convergence rates ($N_{\text{eff}}^{-1.21}$ to $N_{\text{eff}}^{-0.89}$) significantly exceed theoretical worst-case predictions ($N^{-0.5}$), revealing that temporal architectures exploit problem structure in ways current theory does not capture.
\textbf{Theoretical Framework.} To provide the foundations for these empirical investigations, we develop the first architecture-aware generalization bounds for deep temporal models on exponentially $\beta$-mixing sequences. By embedding Golowich et al.'s i.i.d. class bound within a novel blocking scheme that partitions $N$ samples into approximately $B \approx N/\log N$ quasi-independent blocks, we establish polynomial sample complexity under convex Lipschitz losses. The framework achieves $\sqrt{D}$ depth scaling alongside the product of layer-wise norms $R = \prod_{\ell=1}^{D} M^{(\ell)}$, avoiding exponential dependence. While these bounds are conservative, as our empirical results demonstrate, they prove learnability and identify architectural scaling laws, providing worst-case baselines that highlight where future theory must improve to explain observed performance.
URL: https://openreview.net/forum?id=fM9LP57Kai
---
Title: Cost-Aware Routing for Efficient Text-To-Image Generation
Abstract: Diffusion models are well known for their ability to generate a high-fidelity image for an input
prompt through an iterative denoising process. Unfortunately, the high fidelity also comes at
a high computational cost due to the inherently sequential generative process. In this work,
we seek to optimally balance quality and computational cost, and propose a framework to
allow the amount of computation to vary for each prompt, depending on its complexity. Each
prompt is automatically routed to the most appropriate text-to-image generation function,
which may correspond to a distinct number of denoising steps of a diffusion model, or a
disparate, independent text-to-image model. Unlike uniform cost reduction techniques (e.g.,
distillation, model quantization), our approach achieves the optimal trade-off by learning to
reserve expensive choices (e.g., 100+ denoising steps) only for a few complex prompts, and
employ more economical choices (e.g., small distilled model) for less sophisticated prompts.
We empirically demonstrate on COCO and DiffusionDB that by learning to route to nine
already-trained text-to-image models, our approach is able to deliver an average quality that
is higher than that achievable by any of these models alone.
URL: https://openreview.net/forum?id=Jbe9AVsYS6
---
Title: MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design
Abstract: We introduce MolMiner, a fragment-based, geometry-aware, and order-agnostic autoregressive model for molecular design. MolMiner supports conditional generation of molecules over twelve properties, enabling flexible control across physicochemical and structural targets. Molecules are built via symmetry-aware fragment attachments, with 3D geometry dynamically updated during generation using forcefields. A probabilistic conditioning mechanism allows users to specify any subset of target properties while sampling the rest. MolMiner achieves calibrated conditional generation across most properties and offers competitive unconditional performance. We also propose improved benchmarking methods for both unconditional and conditional generation, including distributional comparisons via Wasserstein distance and calibration plots for property control. To our knowledge, this is the first model to unify dynamic geometry, symmetry handling, order-agnostic fragment-based generation, and high-dimensional multi-property conditioning.
URL: https://openreview.net/forum?id=saHRhzqibY
---
Title: Quantile $Q$-Learning: Revisiting Offline Extreme $Q$-Learning with Quantile Regression
Abstract: Offline reinforcement learning (RL) enables policy learning from fixed datasets without further environment interaction, making it particularly valuable in high-risk or costly domains. Extreme $Q$-Learning (XQL) is a recent offline RL method that models Bellman errors using the Extreme Value Theorem, yielding strong empirical performance. However, XQL and its stabilized variant MXQL suffer from notable limitations: both require extensive hyperparameter tuning specific to each dataset and domain, and also exhibit instability during training. To address these issues, we proposed a principled method to estimate the temperature coefficient $\beta$ via quantile regression under mild assumptions. To further improve training stability, we introduce a value regularization technique with mild generalization, inspired by recent advances in constrained value learning. Experimental results demonstrate that the proposed algorithm achieves competitive or superior performance across a range of benchmark tasks, including D4RL and NeoRL2, while maintaining stable training dynamics and using a consistent set of hyperparameters across all datasets and domains.
URL: https://openreview.net/forum?id=tBKznsUimN
---
Title: Making Video Models Adhere to User Intent with Minor Adjustments
Abstract: With the recent drastic advancements in text-to-video diffusion models, controlling their generations has drawn interest. A popular way for control is through bounding boxes or layouts. However, enforcing adherence to these control inputs is still an open problem. In this work, we show that by slightly adjusting user-provided bounding boxes we can improve both the quality of generations and the adherence to the control inputs. This is achieved by simply optimizing the bounding boxes to better align with the internal attention maps of the video diffusion model while carefully balancing the focus on foreground and background. In a sense, we are modifying the bounding boxes to be at places where the model is familiar with. Surprisingly, we find that even with small modifications, the quality of generations can vary significantly. To do so, we propose a smooth mask to make the bounding box position differentiable and an attention-maximization objective that we use to alter the bounding boxes. We conduct thorough experiments, including a user study to validate the effectiveness of our method.
URL: https://openreview.net/forum?id=Opvq2wfBR5
---
Title: Discrete Interpolants: Unifying the Masked Generative and Discrete Diffusion Models
Abstract: In generative models, two paradigms have gained attraction in various applications: next-set prediction-based Masked Generative Models and next-noise prediction-based Non-Autoregressive Models, e.g., Diffusion Models. In this work, we propose using discrete-state models to connect them and explore their scalability in the vision domain. First, we conduct an in-depth analysis in a unified design space across two types of models including timestep-independence, noise schedule, temperature, guidance strength, etc in a scalable manner. Second, from the lens of generative models, we re-cast typical discriminative tasks, e.g., image segmentation, as an unmasking process from [MASK] tokens on a discrete-state model. This enables us to perform various sampling processes, including flexible conditional sampling by only training once to model the joint distribution. All aforementioned explorations lead to our framework named Discrete Interpolants, which enables us to achieve state-of-the-art or competitive performance compared to previous discrete-state based methods in various benchmarks, including ImageNet256, MS COCO, CC12M, as well as the video datasets FaceForensics and DMLab. In summary, by leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models, as well as generative and discriminative tasks. Our code will be released.
URL: https://openreview.net/forum?id=CkAHiOUvOx
---
Title: Discovering Meaningful Units with Visually Grounded Semantics from Image Captions
Abstract: Fine-grained knowledge is crucial for vision-language models to obtain a better understanding of the real world. While there has been work trying to acquire this kind of knowledge in the space of vision and language, it has mostly focused on aligning the image patches with the tokens on the language side. However, image patches do not have any meaning to the human eye, and individual tokens do not necessarily carry groundable information in the image. It is groups of tokens which describe different aspects of the scene. In this work, we propose a model which groups the caption tokens as part of its architecture in order to capture a fine-grained representation of the language. We expect our representations to be at the level of objects present in the image, and therefore align our representations with the output of an image encoder trained to discover objects. We show that by learning to group the tokens, the vision-language model has a better fine-grained understanding of vision and language. In addition, the token groups that our model discovers are highly similar to groundable phrases in text, both qualitatively and quantitatively.
URL: https://openreview.net/forum?id=kndKGnE0tb
---
Title: Frictionless Hamiltonian Descent with Discretization and Parallel Optimization
Abstract: Frictionless Hamiltonian Descent is a recently proposed optimization method that leverages a fundamental principle from classical mechanics. The algorithm is based on energy conservation of the Hamiltonian Flow, with resetting the kinetic energy at each iteration, and is shown to be a descent method. However, the idealized frictionless Hamiltonian Descent requires access to the oracle of the Hamiltonian Flow, while exactly implementing the Hamiltonian Flow becomes elusive when the underlying function is not quadratic. Motivated from considerable popularity of Hamiltonian dynamics in sampling, where a geometric numerical integrator is used to simulate the idealized Hamiltonian Monte Carlo, we consider Hamiltonian Descent with two kinds of integrator, which results in some new optimization dynamics. Moreover, we extend the original framework by introducing various forms of kinetic energy. This expansion yields a broad class of optimization algorithms and provides a fresh perspective of algorithm design. We further propose a novel parallelization technique for parallelizing the inherently sequential updates of the proposed optimization algorithms, where gradients at different points are computed simultaneously. The parallelization technique improves the actual running time by 2-3x in practice for multinomial logistic regression across a range of datasets when 4 GPUs is used, compared to approximating the Hamiltonian Flow in the standard sequential fashion by a single GPU.
URL: https://openreview.net/forum?id=114IOQ3JWe
---
Title: COPA: Comparing the incomparable in multi-objective model evaluation
Abstract: In machine learning (ML), we often need to choose one among hundreds of trained ML models at hand, based on various objectives such as accuracy, robustness, fairness or scalability. However, it is often unclear how to compare, aggregate and, ultimately, trade-off these objectives, making it a time-consuming task that requires expert knowledge, as objectives may be measured in different units and scales. In this work, we investigate how objectives can be automatically normalized and aggregated to systematically help the user navigate their Pareto front. To this end, we make incomparable objectives comparable using their cumulative functions, approximated by their relative rankings. As a result, our proposed approach, COPA, can aggregate them while matching user-specific preferences, allowing practitioners to meaningfully navigate and search for models in the Pareto front. We demonstrate the potential impact of COPA in both model selection and benchmarking tasks across diverse ML areas such as fair ML, domain generalization, AutoML and foundation models, where classical ways to normalize and aggregate objectives fall short.
URL: https://openreview.net/forum?id=NY931v5zc5
---
Title: Depth as Modulation in Weight-Sharing Transformers
Abstract: Weight-sharing architectures provide an efficient design for Transformers. However, their reliance on a single transformation can limit the model's capacity for iterative representation refinement, a process that requires functional specialization across layers. We address this limitation by representing depth through layer-wise perturbations, creating a path toward models that are both parameter-efficient and performant. Our approach iteratively applies a shared block, and we introduce two distinct strategies to perturb its Multi-Head Self-Attention (MHSA) component with each application: a comprehensive QKOV-LoRA and a more parameter-efficient, QK/OV-circuit. The effectiveness of these strategies is validated on vision and language benchmarks, where our models demonstrate favorable performance against layer-sharing counterparts. Our results suggest that layer-wise perturbing a shared structure is an effective principle for developing capable and efficient Transformers.
URL: https://openreview.net/forum?id=wm9jRInse3
---
Title: Towards Bridging the Semantic Spaces of the One-to-Many Mapping in Cross-Modality Text-to-Video Generation
Abstract: Despite recent advances in text-to-video generation, the role of text and video latent spaces in learning a semantically shared representation remains underexplored. In this cross-modality generation task, most methods rely on conditioning the video generation process by injecting the text representation into it, not exploring the implicit shared knowledge between the modalities.
Nonetheless, the feature-based alignment of both modalities is not straightforward, especially for the \textit{one-to-many} mapping scenario, in which one text can be mapped to several valid semantically aligned videos, which generally produces a representation collapse in the alignment phase. In this work, we investigate and give insights on how both modalities cope in a shared semantic space where each modality representation is previously learned in an unsupervised way. We explore a perspective from the latent space learning view and analyze a framework proposed in this work with a plug-and-play nature by adopting autoencoder-based models that could be used with other representations. We show that the one-to-many case requires different alignment strategies than the common ones used in the literature, which suffer in aligning both modalities on a semantically shared space.
URL: https://openreview.net/forum?id=6tigEAsbFw
---
Title: Logical Anomaly Detection with Masked Image Modeling
Abstract: Detecting anomalies such as an incorrect combination of objects or deviations in their positions is a challenging problem in unsupervised anomaly detection (AD). Since conventional AD methods mainly focus on local patterns of normal images, they struggle with detecting logical anomalies that appear in the global patterns. To effectively detect these challenging logical anomalies, we introduce LADMIM (Logical Anomaly Detection with Masked Image Modeling), a novel unsupervised AD framework that harnesses the power of masked image modeling and discrete representation learning. Our core insight is that predicting the missing region forces the model to learn the long-range dependencies between patches. Specifically, we formulate AD as a mask completion task, which predicts the distribution of discrete latents in the masked region. As a distribution of discrete latents is invariant to the low-level variance in the pixel space, the model can desirably focus on the logical dependencies in the image, which improves accuracy in the logical AD. We evaluate the AD performance on five benchmarks and show that our approach achieves compatible performance without any pre-trained segmentation models. We also conduct comprehensive experiments to reveal the key factors that influence logical AD performance.
URL: https://openreview.net/forum?id=uuuaRCMYE3
---
Title: Learning Lagrangian Interaction Dynamics with Sampling-Based Model Order Reduction
Abstract: Simulating physical systems governed by Lagrangian dynamics often entails solving partial differential equations (PDEs) over high-resolution spatial domains, leading to significant computational expense. Reduced-order modeling (ROM) mitigates this cost by evolving low-dimensional latent representations of the underlying system. While neural ROMs enable querying solutions from latent states at arbitrary spatial points, their latent states typically represent the global domain and struggle to capture localized, highly dynamic behaviors such as fluids. We propose a sampling-based reduction framework that evolves Lagrangian systems directly in physical space, over the particles themselves, reducing the number of active degrees of freedom via data-driven neural PDE operators. To enable querying at arbitrary spatial locations, we introduce a learnable kernel parameterization that uses local spatial information from time-evolved sample particles to infer the underlying solution manifold. Empirically, our approach achieves a 6.6$\times$–32$\times$ reduction in input dimensionality while maintaining high-fidelity evaluations across diverse Lagrangian regimes, including fluid flows, granular media, and elastoplastic dynamics. We refer to this framework as GIOROM (\textbf{G}eometry-\textbf{I}nf\textbf{O}rmed \textbf{R}educed-\textbf{O}rder \textbf{M}odeling).
URL: https://openreview.net/forum?id=vXCQA1EzaG
---
Title: APFEx: Adaptive Pareto Front Explorer for Intersectional Fairness
Abstract: Ensuring fairness in machine learning models is critical, especially when biases compound across intersecting protected attributes like race, gender, and age. While existing methods address fairness for single attributes, they fail to capture the nuanced, multiplicative biases faced by intersectional subgroups. We introduce \emph{Adaptive Pareto Front Explorer (APFEx)}, the first framework to explicitly model intersectional fairness as a joint optimization problem over the Cartesian product of sensitive attributes. APFEx combines three key innovations: (1) an adaptive multi-objective optimizer that dynamically switches between Pareto cone projection, gradient weighting, and exploration strategies to navigate fairness-accuracy trade-offs; (2) differentiable intersectional fairness metrics enabling gradient-based optimization of non-smooth subgroup disparities; and (3) theoretical guarantees of convergence to Pareto-optimal solutions. Experiments on four real-world datasets demonstrate APFEx’s superiority, reducing fairness violations while maintaining competitive accuracy. Our work bridges a critical gap in fair ML, providing a scalable, model-agnostic solution for intersectional fairness.
URL: https://openreview.net/forum?id=ysbzXL4tZJ
---