Daily TMLR digest for Mar 13, 2025

14 views

Skip to first unread message

TMLR

unread,

Mar 13, 2025, 12:06:07 AM3/13/25

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Causal Discovery over High-Dimensional Structured Hypothesis Spaces with Causal Graph Partitioning

Authors: Ashka Shah, Adela Frances DePavia, Nathaniel C Hudson, Ian Foster, Rick Stevens

Abstract: The aim in many sciences is to understand the mechanisms that underlie the observed distribution of variables, starting from a set of initial hypotheses. Causal discovery allows us to infer mechanisms as sets of cause and effect relationships in a generalized way---without necessarily tailoring to a specific domain. Causal discovery algorithms search over a structured hypothesis space, defined by the set of Directed Acyclic Graphs (DAG), to find the graph that best explains the data. For high-dimensional problems, however, this search becomes intractable and scalable algorithms for causal discovery are needed to bridge the gap.
In this paper, we define a novel causal graph partition that allows for divide-and-conquer causal discovery with theoretical guarantees under the Maximal Ancestral Graph (MAG) class. We leverage the idea of a superstructure---a set of learned or existing candidate hypotheses---to partition the search space. We prove under certain assumptions that learning with a causal graph partition always yields the Markov Equivalence Class of the true causal graph. We show our algorithm achieves comparable accuracy and a faster time to solution for biologically-tuned synthetic networks and networks up to ${10^4}$ variables. This makes our method applicable to gene regulatory network inference and other domains with high-dimensional structured hypothesis spaces.

URL: https://openreview.net/forum?id=FecsgPCOHk

---

Title: On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective

Authors: Tal Alter, Raz Lapid, Moshe Sipper

Abstract: Kolmogorov-Arnold Networks (KANs) have recently emerged as a novel paradigm for function approximation by leveraging univariate spline-based decompositions inspired by the Kolmogorov–Arnold theorem. Despite their theoretical appeal---particularly the potential for inducing smoother decision boundaries and lower effective Lipschitz constants---their adversarial robustness remains largely unexplored. In this work, we conduct the first comprehensive evaluation of KAN robustness in adversarial settings, focusing on both fully connected (FCKANs) and convolutional (CKANs) instantiations for image classification tasks. Across a wide range of benchmark datasets (MNIST, FashionMNIST, KMNIST, CIFAR-10, SVHN, and a subset of ImageNet), we compare KANs against conventional architectures using an extensive suite of attacks, including white-box methods (FGSM, PGD, C\&W, MIM), black-box approaches (Square Attack, SimBA, NES), and ensemble attacks (AutoAttack). Our experiments reveal that while small- and medium-scale KANs are not consistently more robust than their standard counterparts, large-scale KANs exhibit markedly enhanced resilience against adversarial perturbations. An ablation study further demonstrates that critical hyperparameters---such as number of knots and spline order---significantly influence robustness. Moreover, adversarial training experiments confirm the inherent safety advantages of KAN-based architectures. Overall, our findings provide novel insights into the adversarial behavior of KANs and lay a rigorous foundation for future research on robust, interpretable network designs.

URL: https://openreview.net/forum?id=uafxqhImPM

---

Title: CroissantLLM: A Truly Bilingual French-English Language Model

Authors: Manuel Faysse, Patrick Fernandes, Nuno M Guerreiro, António Loison, Duarte Miguel Alves, Caio Corrro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro Henrique Martins, Antoni Bigata Casademunt, François Yvon, Andre Martins, Gautier Viaud, CELINE HUDELOT, Pierre Colombo

Abstract: We introduce CroissantLLM, a 1.3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware. To that end, we pioneer the approach of training an intrinsically bilingual model with a 1:1 English-to-French pretraining data ratio, a custom tokenizer, and bilingual finetuning datasets. We release the training dataset, notably containing a French split with manually curated, high-quality, and varied data sources. To assess performance outside of English, we craft a novel benchmark, FrenchBench, consisting of an array of classification and generation tasks, covering various orthogonal aspects of model performance in the French Language. Additionally, rooted in transparency and to foster further Large Language Model research, we release codebases, and dozens of checkpoints across various model sizes, training data distributions, and training steps, as well as fine-tuned Chat models, and strong translation models. We evaluate our model through the FMTI framework, and validate 81 % of the transparency criteria, far beyond the scores of even most open initiatives. This work enriches the NLP landscape, breaking away from previous English-centric work in order to strengthen our understanding of multilinguality in language models.

URL: https://openreview.net/forum?id=uA19Xo1o31

---

Title: Reheated Gradient-based Discrete Sampling for Combinatorial Optimization

Authors: Muheng Li, Ruqi Zhang

Abstract: Recently, gradient-based discrete sampling has emerged as a highly efficient, general-purpose solver for various combinatorial optimization (CO) problems, achieving performance comparable to or surpassing the popular data-driven approaches. However, we identify a critical issue in these methods, which we term ``wandering in contours''. This behavior refers to sampling new different solutions that share very similar objective values for a long time, leading to computational inefficiency and suboptimal exploration of potential solutions. In this paper, we introduce a novel reheating mechanism inspired by the concept of critical temperature and specific heat in physics, aimed at overcoming this limitation. Empirically, our method demonstrates superiority over existing sampling-based and data-driven algorithms across a diverse array of CO problems.

URL: https://openreview.net/forum?id=uPCvfyr2KP

---

Title: Enhancing Fairness in Unsupervised Graph Anomaly Detection through Disentanglement

Authors: Wenjing Chang, Kay Liu, Philip S. Yu, Jianjun Yu

Abstract: Graph anomaly detection (GAD) is becoming increasingly crucial in various applications, ranging from financial fraud detection to fake news detection. However, current GAD methods largely overlook the fairness problem, which might result in discriminatory decisions
skewed toward certain demographic groups defined on sensitive attributes (e.g., gender). This greatly limits the applicability of these methods in real-world scenarios in light of societal and ethical restrictions. To address this critical gap, we make the first attempt
to integrate fairness with utility in GAD decision-making. Specifically, we devise a novel DisEntangle-based FairnEss-aware aNomaly Detection framework on the attributed graph, named DEFEND. DEFEND first introduces disentanglement in GNNs to capture informative yet sensitive-irrelevant node representations, effectively reducing bias inherent in graphrepresentation learning. Besides, to alleviate discriminatory bias in evaluating anomalies, DEFEND adopts a reconstruction-based method, which concentrates solely on node attributes and avoids incorporating biased graph topology. Additionally, given the inherent association between sensitive-relevant and -irrelevant attributes, DEFEND further constrains the correlation between the reconstruction error and predicted sensitive attributes. Empirical evaluations on real-world datasets reveal that DEFEND performs effectively in GAD and significantly enhances fairness compared to state-of-the-art baselines. Our code is available at https://github.com/AhaChang/DEFEND.

URL: https://openreview.net/forum?id=5zRs34Ls3C

---

Title: Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models!

Authors: Arash Mari Oriyad, Mohammadali Banayeeanzade, Reza Abbasi, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah

Abstract: Text-to-image diffusion models such as Stable Diffusion and DALL-E have exhibited impressive capabilities in producing high-quality, diverse, and realistic images based on textual prompts. Nevertheless, a common issue arises where these models encounter difficulties in faithfully generating every entity specified in the prompt, leading to a recognized challenge known as entity missing in visual compositional generation. While previous studies indicated that actively adjusting cross-attention maps during inference could potentially resolve the issue, there has been a lack of systematic investigation into the specific objective function required for this task. In this work, we thoroughly investigate three potential causes of entity missing from the perspective of cross-attention maps: insufficient attention intensity, excessive attention spread, and significant overlap between attention maps of different entities.
Through comprehensive empirical analysis, we found that optimizing metrics that quantify the overlap between attention maps of entities is highly effective at mitigating entity missing. We hypothesize that during the denoising process, entity-related tokens engage in a form of competition for attention toward specific regions through the cross-attention mechanism. This competition may result in the attention of a spatial location being divided among multiple tokens, leading to difficulties in accurately generating the entities associated with those tokens. Building on this insight, we propose four overlap-based loss functions that can be used to implicitly manipulate the latent embeddings of the diffusion model during inference: Intersection over union (IoU), center-of-mass (CoM) distance, Kullback–Leibler (KL) divergence, and clustering compactness (CC). Extensive experiments on a diverse set of prompts demonstrate that our proposed training-free methods substantially outperform previous approaches on a range of compositional alignment metrics, including visual question-answering, captioning score, CLIP similarity, and human evaluation. Notably, our method outperforms the best baseline by $9\%$ in human evaluation.

URL: https://openreview.net/forum?id=Xv3ZrFayIO

---

New submissions
===============

Title: A2Perf: Real-World Autonomous Agents Benchmark

Abstract: Autonomous agents and systems cover a number of application areas, from robotics and digital assistants to combinatorial optimization, all sharing common, unresolved research challenges. It is not sufficient for agents to merely solve a given task; they must generalize to out-of-distribution tasks, perform reliably, and use hardware resources efficiently during training and on-device deployment, among other requirements. Several major classes of methods, such as reinforcement learning and imitation learning, are commonly used to tackle these problems, each with different trade-offs. However, there is currently no benchmarking suite that defines the environments, datasets, and metrics which can be used to develop reference implementations and seed leaderboards with baselines, providing a meaningful way for the community to compare progress. We introduce A2Perf—a benchmarking suite including three environments that closely resemble real-world domains: computer chip floorplanning, web navigation, and quadruped locomotion.
A2Perf provides metrics that track task performance, generalization, system resource efficiency, and reliability, which are all critical to real-world applications. Using A2Perf, we demonstrate that web navigation agents can achieve latencies comparable to human reaction times on consumer hardware, reveal important reliability trade-offs between algorithms for quadruped locomotion, and quantify the total energy costs of different learning approaches for computer chip-design. In addition, we propose a data cost metric to account for the cost incurred acquiring offline data for imitation learning, reinforcement learning, and hybrid algorithms, which allows us to better compare these approaches. A2Perf also contains baseline implementations of standard algorithms, enabling apples-to-apples comparisons across methods and facilitating progress in real-world autonomy. As an open-source and extendable benchmark, A2Perf is designed to remain accessible, documented, up-to-date, and useful to the research community over the long term.

URL: https://openreview.net/forum?id=AoGliDAEPC

---

Title: Factor Learning Portfolio Optimization Informed by Continuous-Time Finance Models

Abstract: We study financial portfolio optimization in the presence of unknown and uncontrolled system variables referred to as stochastic factors. Existing work falls into two distinct categories: (i) reinforcement learning employs end-to-end policy learning with flexible factor representation, but does not precisely model the dynamics of asset prices or factors; (ii) continuous-time finance methods, in contrast, take advantage of explicitly modeled dynamics but pre-specify, rather than learn, factor representation. We propose FaLPO (factor learning portfolio optimization), a framework that interpolates between these two approaches. Specifically, FaLPO hinges on deep policy gradient to learn a performant investment policy that takes advantage of flexible representation for stochastic factors. Meanwhile, FaLPO also incorporates continuous-time finance models when modeling the dynamics. It uses the optimal policy functional form derived from such models and optimizes an objective that combines policy learning and model calibration. We prove the convergence of FaLPO and provide performance guarantees via a finite-sample bound. On both synthetic and real-world portfolio optimization tasks, we observe that FaLPO outperforms five leading methods. Finally, we show that FaLPO can be extended to other decision-making problems with stochastic factors.

URL: https://openreview.net/forum?id=KLOJUGusVE

---

Title: CLImage: Human-Annotated Datasets for Complementary-Label Learning

Abstract: Complementary-label learning (CLL) is a weakly-supervised learning paradigm that aims to train a multi-class classifier using only complementary labels, which indicate classes to which an instance does not belong. Despite numerous algorithmic proposals for CLL, their practical applicability remains unverified for two reasons. Firstly, these algorithms often rely on assumptions about the generation of complementary labels, and it is not clear how far the assumptions are from reality. Secondly, their evaluation has been limited to synthetic datasets. To gain insights into the real-world performance of CLL algorithms, we developed a protocol to collect complementary labels from human annotators. Our efforts resulted in the creation of four datasets: CLCIFAR10, CLCIFAR20, CLMicroImageNet10, and CLMicroImageNet20, derived from well-known classification datasets CIFAR10, CIFAR100, and TinyImageNet200. These datasets represent the very first real-world CLL datasets. Through extensive benchmark experiments, we discovered a notable decrease in performance when transitioning from synthetic datasets to real-world datasets. We investigated the key factors contributing to the decrease with a thorough dataset-level ablation study. Our analyses highlight annotation noise as the most influential factor in the real-world datasets. In addition, we discover that the biased-nature of human-annotated complementary labels and the difficulty to validate with only complementary labels are two outstanding barriers to practical CLL. These findings suggest that the community focus more research efforts on developing CLL algorithms and validation schemes that are robust to noisy and biased complementary-label distributions.

URL: https://openreview.net/forum?id=FHkWY4aGsN

---

Title: Inverse Reinforcement Learning via Inverse Optimization

Abstract: Inverse reinforcement learning (IRL) and inverse optimization (IO) for Markov decision processes (MDPs) have developed independently in the literature, despite addressing the same problem. We establish the relationship between the IO framework for MDPs and the convex-analytic view of the apprenticeship learning (AL) formalism proposed by Kamoutsi et al. (2021). Furthermore, we demonstrate that this view of the AL formalism emerges as a relaxation of the IRL problem when observed through the lens of IO. The proposed formulation frames the IRL problem as a regularized min-max problem, extending prior approaches. Notably, the AL formalism is a special case when the regularization term is absent. We solve the regularized-convex-concave-min-max problem using stochastic mirror descent (SMD) and establish convergence bounds for the proposed method. Numerical experiments highlight the critical role of regularization in recovering the true cost vector for IRL problems.

URL: https://openreview.net/forum?id=AEvdHZFUJR

---

Title: Explaining Caption-Image Interactions in CLIP models with Second-Order Attributions

Abstract: Dual encoder architectures like Clip models map two types of inputs into a shared em- bedding space and predict similarities between them. Despite their success, it is, however, not understood how these models compare their two inputs. Common first-order feature- attribution methods can only provide limited insights into dual-encoders since their predic- tions depend on feature-interactions rather than on individual features.
In this paper, we first derive a second-order method enabling the attribution of predictions by any differentiable dual encoder onto feature-interactions between its inputs. Second, we apply our method to Clip models and show that they learn fine-grained correspondences between parts of captions and regions in images. They match objects across input modes also account for mismatches. This visual-linguistic grounding ability, however, varies heav- ily between object classes and exhibits pronounced out-of-domain effects. We can identify individual errors as well as systematic failure categories including object coverage, unusual scenes and correlated contexts.

URL: https://openreview.net/forum?id=HUUL19U7HP

---

Title: Preference Discerning with LLM-Enhanced Generative Retrieval

Abstract: In sequential recommendation, models recommend items based on user's interaction history. To this end, they usually incorporate information such as item descriptions and user intent or preferences. User preferences are usually not given in open-source datasets and thus need to be approximated, for example via large language models (LLMs). Current works incorporate approximated user preferences as targets for auxiliary tasks during training of the recommendation model to assist with downstream performance. However, this is limiting, as they cannot dynamically adapt to changing user preferences after training and require re-training, which is impractical. To address this issue, we propose a new paradigm, namely preference discerning, in which we explicitly condition a generative recommendation model on user preferences in natural language, within its context. Furthermore, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. Upon assessing current state-of-the-art methods using our benchmark, we discover that they struggle to accurately discern user preferences. To address this, we propose a new method named Mender (Multimodal Preference Discerner), which achieves state-of-the-art performance in our benchmark.
Our results show that Mender can be effectively guided by human preferences, even if not observed during training, paving the way toward more personalized recommendation models.

URL: https://openreview.net/forum?id=74mrOdhvvT

---

Title: Generalization over Memorization in In-Context Learning

Abstract: Transformers exhibit remarkable in-context learning capabilities, solving new tasks without requiring explicit model weight updates. However, existing training paradigms for in-context learners rely on vast, unstructured datasets, which are costly and challenging to collect. These paradigms diverge significantly from how humans learn. Motivated by these limitations, we propose a paradigm shift: training on multiple smaller, domain-specific datasets to improve generalization. We investigate this paradigm by leveraging meta-learning to train an in-context learner across diverse, small-scale datasets using the Meta-Album benchmark. We further investigate realistic scenarios, including domain streaming with curriculum learning strategies and settings where training data is entirely unlabeled. Our experiments demonstrate that this multi-dataset approach promotes broader generalization, enhances robustness in streaming scenarios, and achieves competitive performance even under unsupervised conditions. These findings advance the in-context learning paradigm and shed light on how to bridge the gap between artificial and natural learning processes.

URL: https://openreview.net/forum?id=XMuVlWbjQm

---

Title: Reproducibility Study of "Attack-Resilient Image Water- marking Using Stable Diffusion"

Abstract: This paper presents a reproducibility study and robustness evaluation of the paper ‘Attack-
Resilient Image Watermarking Using Stable Diffusion’ by Zhang et al. (2024), which proposes
ZoDiac, a Stable Diffusion-based framework for attack-resilient image watermarking. While
successfully replicating the original method’s core claims—achieving >90% watermark de-
tection rate (WDR) against diffusion-based regeneration attacks and across MS-COCO,
DiffusionDB, and WikiArt datasets—we identify critical vulnerabilities under adversarial
and geometrically asymmetric attack paradigms. Our extended analysis demonstrates that
gradient-based adversarial perturbations reduce ZoDiac’s WDR, a threat model absent in
prior evaluations. We also investigate rotationally asymmetric attacks achieving WDR be-
low 65%. We also investigate a new loss function to mitigate these limitations. Despite
these enhancements, composite attacks combining adversarial noise with other methods re-
duce WDR to near-zero, exposing vulnerabilities through multi-stage offensive pipelines.
Our implementation can be found on Anonymous Github .

URL: https://openreview.net/forum?id=xoQV6kdTqG

---

Title: [Re] Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

Abstract: Large Language Models (LLMs) are increasingly used in strategic decision-making environ-
ments, including game-theoretic scenarios where multiple agents interact under predefined
rules. One such setting is the common pool resource environment. In this study, we build
upon Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM
Agents (Piatti et al., 2024), a framework designed to test cooperation strategies among LLM
agents. We begin by replicating their results to a large degree to validate the framework.
Then, we extend their analysis by identifying a notable trend: specialized models trained
on research papers and mathematical reasoning tasks outperform general-purpose models of
similar scale in this environment. Additionally, we evaluate the recently released DeepSeek-
R1-Distill models, which show improvements over their baseline counterparts but come at
a higher computational cost. Finally, we investigate the impact of different prompting
strategies, including the veil of ignorance mechanism and other prompting strategies based
on universalization principles with varying levels of abstraction. Our results suggest that
older models benefit significantly from explicit boundary conditions, whereas newer models
demonstrate greater robustness to implicit constraints.

URL: https://openreview.net/forum?id=EWWxSkUchO

---

Title: Reproducibility Study: Mastering cooperation between small LLMs within the Governance of the Commons Simulation

Abstract: Governance of the Commons Simulation (GovSim) is a Large Language Model (LLM) multi- agent framework designed to study cooperation and sustainability between LLM agents in resource-sharing environments (Piatti et al., 2024). Understanding the cooperation capabilities of LLMs is vital to the real-world applicability of these models. This reproducibility study aims to verify the claims in the original paper by replicating their experiments using small open-source LLMs and extending the framework. The original paper claims that (1) GovSim enables the study and benchmarking of emergent sustainable behavior, (2) only the largest and most powerful LLM agents achieve a sustainable equilibrium, while smaller models fail, and (3) agents using universalization-based reasoning significantly improve sus- tainability. To test the second claim, we conducted simulations with the small open-source models used in the original study. Additionally, by running the same experiments with small SOTA DeepSeek models, we successfully achieved a sustainable equilibrium. This contradicts the original claim, suggesting that recent advances in LLMs have improved the cooperation abilities of small LLMs. Regarding the third claim, our results confirm that universalization-based reasoning improves performance in the GovSim environment, sup- porting the third claim of the author. However, further analysis suggests that the improved performance primarily stems from the numerical instructions provided to agents rather than the principle of universalization itself.

URL: https://openreview.net/forum?id=ON8EMrNwww

---

Title: A Reproducibility Study of Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

Abstract: Many neural networks, especially over-parameterized ones, suffer from poor calibration and overconfidence. To address this, Jordahn & Olmos (2024) recently proposed a Two-Stage Training (TST) procedure that decouples the training of feature extraction and classification layers. In this study, we replicate their findings and extend their work through a series of ablation studies. We reproduce their main results and find that most of them replicate, with slight deviation for CIFAR100. Additionally, we extend the author’s results by exploring the impact of different model architectures, Monte Carlo (MC) sample sizes, and classification head designs. We further compare the method with focal loss - an implicit regularization technique known to improve calibration - and investigate whether calibration can be improved further by combining the two methods. We find that calibration can be improved even further by using focal loss in the first training stage of two-stage training. Our experiments validate the claims made by Jordahn & Olmos (2024), and show the transferability of the two-stage training to different architectures.

URL: https://openreview.net/forum?id=5Hwzd48ILf

---

Title: Reassessing Fairness: A Reproducibility Study of NIFA’s Impact on GNN Models

Abstract: Graph Neural Networks (GNNs) have demonstrated exceptional performance in processing graph-structured data, yet fairness concerns remain a critical challenge due to GNNs amplifying bias and prejudice in training data. The Node Injection-based Fairness Attack (NIFA)
(Luo et al., 2024) was recently proposed as a gray-box method to compromise fairness while maintaining model utility. This study aims to reproduce and validate the claims of NIFA, assessing its impact across multiple datasets and GNN architectures. This reproduction
study confirms that NIFA is an effective gray-box attack that degrades the fairness metrics, statistical parity, and equal odds while having a negligible utility loss. Additionally, NIFA’s ability to outperform other graph utility and fairness attacks is inconclusive. Finally, we extend the original work by evaluating NIFA’s performance under multi-class sensitive attributes and varying levels of homophily. NIFA’s ability to degrade fairness shows promising results in a multi-class sensitive attribute environment. Varying levels of homophily showed minimal utility loss and stable fairness metrics across most configurations, with the exception of heterophilic-homophilic and highly homophilic settings. The codebase used in this study can be found at https://anonymous.4open.science/r/Reassessing-NIFA-B4F5/.

URL: https://openreview.net/forum?id=l5fXUKi8GO

---

Title: Scaling Channel-Invariant Self-Supervised Learning

Abstract: Recent advances in self-supervised pre-training of foundation models for natural images have
made them a popular choice for various visual systems and applications. Self-supervised
strategies are also promising in non-RGB scientific imaging domains such as in biology, medical
and satellite imagery, but their broader application is hampered by heterogeneity in channel
composition and semantics between relevant datasets: two datasets may contain different
numbers of channels, and these may reveal distinct aspects of an object or scene. Recent
works on channel-invariant strategies report substantial advantages for those that account
for variable channel compositions without sacrificing the ability to jointly encode channels;
yet, how these strategies behave at scale remains unclear. We here show that, surprisingly,
trained across large-scale microscopy datasets, independent-encoding of channels consistently
outperforms joint-encoding methods by a substantial margin. We validate this result along an
extensive set of experiments on various datasets from cell microscopy to geospatial imagery.
Our DINO BoC approach sets a new state-of-the-art across challenging benchmarks, including
generalization to out-of-distribution tasks and unseen channel combinations at test time. We
will open source the code, along with model weights that constitute a new general purpose
feature extractor for fluorescent microscopy.

URL: https://openreview.net/forum?id=pT8sgtRVAf

---

Title: [RE] Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections

Abstract: Graph Neural Networks (GNNs) have become indispensable for learning on graph-structured data, with applications in socially sensitive domains such as recommendation systems and healthcare. However, recent research has revealed that fairness-enhancing GNNs remain vulnerable to adversarial attacks, raising concerns about their real-world robustness. This paper represents a reproducibility study of Luo et al. (2024), which demonstrates that adversarial node injection can effectively compromise fairness while preserving overall predictive accuracy. Our results confirm that such attacks are efficient (requiring minimal perturbations), realistic (exploiting feasible node injections), and deceptive (causing fairness degradation without significant accuracy loss). Along with validating the original findings, we redefine their framework as an evasion attack, showing that the attack remains effective on a clean model. Furthermore, we propose a novel defense strategy and analyze the impact of model depth on the attack. Our results highlight the need for more robust GNN architectures against fairness-targeted adversarial threats.

URL: https://openreview.net/forum?id=wnnh4XjFXp

---

Title: MARL-LNS: Efficient Multi-agent Deep Reinforcement Learning via Large Neighborhoods Search

Abstract: Cooperative multi-agent reinforcement learning (MARL) has emerged as a powerful paradigm for addressing complex real-world problems. However, the well-established centralized training with decentralized execution framework is hampered by the curse of dimensionality, leading to prolonged training times and inefficient convergence. In this work, we introduce MARL-LNS, a general training framework that overcomes these challenges by iteratively training on alternating subsets of agents with existing deep MARL algorithms serving as low-level trainers—without incurring any additional trainable parameters. Building on this framework, we propose three variants—Random Large Neighborhood Search (RLNS), Batch Large Neighborhood Search (BLNS), and Adaptive Large Neighborhood Search (ALNS)—each differing in its strategy for alternating agent subsets. Empirical evaluations on both the StarCraft Multi-Agent Challenge and Google Research Football environments demonstrate that our approach can reduce training time by at least 10\% while achieving comparable final performance to state-of-the-art methods.

URL: https://openreview.net/forum?id=O3e4x8W6GL

---

Title: Learning with Noisy Labels [Re]visited

Abstract: Learning with noisy labels (LNL) is a subfield of supervised machine learning investigating scenarios in which the training data contain errors. While most research has focused on synthetic noise, where labels are randomly corrupted, real-world noise from human annotation errors is more complex and less understood. Wei et al. (2022) introduced CIFAR-N, a dataset with human-labeled noise and claimed that real-world noise is fundamentally more challenging than synthetic noise. This study aims to reproduce their experiments on testing the characteristics of human-annotated label noise, memorization dynamics, and benchmarking of LNL methods. We successfully reproduce some of the claims but identify some quantitative discrepancies. Notably, our attempts to reproduce the reported benchmark reveal inconsistencies in the reported results. To address these issues, we develop a unified framework and propose a refined benchmarking protocol that ensures a fairer evaluation of LNL methods. Our findings confirm that real-world noise differs structurally from synthetic noise and is memorized more rapidly by deep networks. By open-sourcing our implementation, we provide a more reliable foundation for future research in LNL.

URL: https://openreview.net/forum?id=GKZ2leags0

---

Title: Neural varifolds: an aggregate representation for quantifying the geometry of point clouds

Abstract: Point clouds are popular 3D representations for real-life objects (such as in LiDAR and Kinect) due to their detailed and compact representation of surface-based geometry. Recent approaches characterise the geometry of point clouds by bringing deep learning based techniques together with geometric fidelity metrics such as optimal transportation costs (e.g., Chamfer and Wasserstein metrics). In this paper, we propose a new surface geometry characterisation within this realm, namely a neural varifold representation of point clouds. Here, the surface is represented as a measure/distribution over both point positions and tangent spaces of point clouds. The varifold representation quantifies not only the surface geometry of point clouds through the manifold-based discrimination, but also subtle geometric consistencies on the surface due to the combined product space. This study proposes neural varifold algorithms to compute the varifold norm between two point clouds using neural networks on point clouds and their neural tangent kernel representations. The proposed neural varifold is evaluated on three different sought-after tasks -- shape matching, few-shot shape classification, and shape reconstruction. Detailed evaluation and comparison to the state-of-the-art methods demonstrate that the proposed versatile neural varifold is superior in shape matching and few-shot shape classification, and is competitive for shape reconstruction.

URL: https://openreview.net/forum?id=P02hoA7vln

---

Reply all

Reply to author

Forward

0 new messages