Daily TMLR digest for Aug 13, 2025

1 view
Skip to first unread message

TMLR

unread,
Aug 13, 2025, 12:06:41 AMAug 13
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: FoldDiff: Folding in Point Cloud Diffusion

Authors: Yuzhou Zhao, Juan Matias Di Martino, Amirhossein Farzam, Guillermo Sapiro

Abstract: Diffusion denoising has emerged as a powerful approach for modeling data distributions, treating data as particles with their position and velocity modeled by a stochastic diffusion process. While this framework assumes data resides in a fixed vector spaces (e.g., images as pixel-ordered vectors), point clouds present unique challenges due to their unordered representation. Existing point cloud diffusion methods often rely on voxelization to address this issue, but this approach is computationally expensive, with cubically scaling complexity. In this work, we investigate the misalignment between point cloud irregularity and diffusion models, analyzing it through the lens of denoising implicit priors. First, we demonstrate how the unknown permutations inherent in point cloud structures disrupt denoising implicit priors. To address this, we then propose a novel folding-based approach that reorders point clouds into a permutation-invariant grid, enabling diffusion to be performed directly on the structured representation. This construction is exploited both globally and locally. Globally, \reviewcdmS{folded objects can represent point cloud objects} in a fixed vector space (like images), therefore it enables us to extend the work of denoising as implicit priors to point clouds. \reviewcdmS{Locally, the folded tokens are} efficient and novel token representations that can improve existing transformer-based point cloud diffusion models. Our experiments show that the proposed folding operation integrates effectively with both denoising implicit priors as well as advanced diffusion architectures, such as UNet and Diffusion Transformers (DiTs). Notably, DiT with \reviewcdmS{locally} folded tokens achieves competitive generative performance compared to state-of-the-art models while significantly reducing training and inference costs relative to voxelization-based methods.

URL: https://openreview.net/forum?id=pmRabMH1JW

---

Title: Personalized Federated Learning via Low-Rank Matrix Optimization

Authors: Ali Dadras, Sebastian U Stich, Alp Yurtsever

Abstract: Personalized Federated Learning (pFL) has gained significant attention for building a suite of models tailored to different clients. In pFL, the challenge lies in balancing the reliance on local datasets, which may lack representativeness, against the diversity of other clients' models, whose quality and relevance are uncertain. Focusing on the clustered FL scenario, where devices are grouped based on similarities in their data distributions without prior knowledge of cluster memberships, we develop a mathematical model for pFL using low-rank matrix optimization. Building on this formulation, we propose a pFL approach leveraging the Burer-Monteiro factorization technique. We examine the convergence guarantees of the proposed method and present numerical experiments on training deep neural networks, demonstrating the empirical performance of the proposed method in scenarios where personalization is crucial.

URL: https://openreview.net/forum?id=DFJu1QB2Nr

---

Title: Node-Level Data Valuation on Graphs

Authors: Simone Antonelli, Aleksandar Bojchevski

Abstract: How much is a node worth? We answer this question using an emerging set of data valuation techniques, where the value of a data point is measured via its marginal contribution when added to the (training) dataset. Data valuation has been primarily studied in the i.i.d. setting, giving rise to methods like influence functions, leave-one-out estimation, data Shapley, and data Banzhaf. We conduct a comprehensive study of data valuation approaches applied to graph-structured models such as graph neural networks in a semi-supervised transductive setting. Since all nodes (labeled and unlabeled) influence both training and inference we construct various scenarios to understand the diverse mechanisms by which nodes can impact learning. We show that the resulting node values can be used to identify (positively and negatively) influential nodes, quantify model brittleness, detect poisoned data, and accurately predict counterfactuals.

URL: https://openreview.net/forum?id=tNyApIqDSJ

---

Title: Unified Wisdom: Harnessing Collaborative Learning to Improve Efficacy of Knowledge Distillation

Authors: Atharva Abhijit Tambat, Durga S, Ganesh Ramakrishnan, Pradeep Shenoy

Abstract: Knowledge distillation (KD), which involves training a smaller student model to approximate the predictions of a larger teacher model is useful in striking a balance between model accuracy and computational constraints. However, KD has been found to be ineffective when the teacher and student models have a significant capacity gap. In this work, we address this issue via “meta-collaborative distillation” (MC-Distil), where students of varying capacities collaborate during distillation. Using a “coordinator” network (C-Net), MC-Distil enables mutual learning among students as a meta-learning task. Our insight is that C-Net learns from each student’s performance and training instance characteristics, allowing students of different capacities to improve together. Our method enhances student accuracy for all students, surpassing state-of-the-art baselines, including multi-step distillation, consensus enforcement, and teacher re-training. We achieve average gains of 2.5% on CIFAR-100 and 2% on Tiny ImageNet datasets, consistently across diverse student sizes, teacher sizes, and architectures. Notably, larger students benefiting through meta-collaboration with smaller students is a novel idea. MC-Distil excels in training superior student models under real-world conditions such as label noise and domain adaptation. Our approach also yields consistent improvements on the MS COCO object detection benchmark and introduces only a modest 5% computational overhead during training, with no additional cost at inference.

URL: https://openreview.net/forum?id=Zj9bb8aQNg

---

Title: On the Convergence of SVGD in KL divergence via Approximate gradient flow

Authors: Masahiro Fujisawa, Futoshi Futami

Abstract: This study investigates the convergence of Stein variational gradient descent (SVGD), which is used to approximate a target distribution based on a gradient flow on the space of probability distributions. The existing studies mainly focus on the convergence in the kernel Stein discrepancy, which doesn't imply weak convergence in many practical settings. To address this issue, we propose to introduce a novel analytical approach called $(\epsilon,\delta)$-approximate gradient flow, extending conventional concepts of approximation error for the Wasserstein gradient. With this approach, we show the sub-linear convergence of SVGD in Kullback--Leibler divergence under the discrete time and infinite particle settings. Finally, we validate our theoretical findings through several numerical experiments.

URL: https://openreview.net/forum?id=AG1zXt5aoA

---

Title: BELLA: Black-box model Explanations by Local Linear Approximations

Authors: Nedeljko Radulovic, Albert Bifet, Fabian M. Suchanek

Abstract: Understanding the decision-making process of black-box models has become not just a legal requirement, but also an additional way to assess their performance. However, the state of the art post-hoc explanation approaches for regression models rely on synthetic data generation, which introduces uncertainty and can hurt the reliability of the explanations. Furthermore, they tend to produce explanations that apply to only very few data points. In this paper, we present BELLA, a deterministic model-agnostic post-hoc approach for explaining the individual predictions of regression black-box models. BELLA provides explanations in the form of a linear model trained in the feature space. BELLA maximizes the size of the neighborhood to which the linear model applies so that the explanations are accurate, simple, general, and robust.

URL: https://openreview.net/forum?id=F9Kv96KcwM

---


New submissions
===============


Title: Melody or Machine: Detecting Synthetic Music with Dual- Stream Contrastive Learning

Abstract: The rapid evolution of end-to-end AI music generation poses an escalating threat to artistic authenticity and copyright, demanding detection methods that can keep pace. While foundational, existing models like SpecTTTra falter when faced with the diverse and rapidly advancing ecosystem of new generators, exhibiting significant performance drops on out-of-distribution (OOD) content. This generalization failure highlights a critical gap: the need for more challenging benchmarks and more robust detection architectures. To address this, we first introduce Melody or Machine (MoM), a new large-scale benchmark of over 130,000 songs (6,665 hours). MoM is the most diverse dataset to date, built with a mix of open and closed-source models and a curated OOD test set designed specifically to foster the development of truly generalizable detectors. Alongside this benchmark, we introduce CLAM, a novel dual-stream detection architecture. We hypothesize that subtle, machine-induced inconsistencies between vocal and instrumental elements, often imperceptible in a mixed signal, offer a powerful tell-tale sign of synthesis. CLAM is designed to test this hypothesis by employing two distinct pre-trained audio encoders (MERT and Wave2Vec2) to create parallel representations of the audio. These representations are fused by a learnable cross-aggregation module that models their inter-dependencies. The model is trained with a dual-loss objective: a standard binary cross-entropy loss for classification, complemented by a contrastive triplet loss which trains the model to distinguish between coherent and artificially mismatched stream pairings, enhancing its sensitivity to synthetic artifacts without presuming a simple feature alignment. CLAM establishes a new state-of-the-art in synthetic music forensics. It achieves an F1 score of 0.925 on our challenging MoM benchmark, significantly outperforming the previous SOTA's 0.869 on the same dataset. This result demonstrates superior generalization to unseen generative models. Furthermore, CLAM scores 0.993 on the popular SONICS benchmark, confirming its effectiveness and setting a new performance standard.

URL: https://openreview.net/forum?id=Ufwes0o2e3

---

Title: Towards Fair In-Context Learning with Tabular Foundation Models

Abstract: Transformer-based tabular foundation models have recently demonstrated promising in-context learning (ICL) performance on structured data, emerging as competitive alternatives to gradient-boosted trees. However, the fairness implications of this new paradigm remain largely unexplored. We present the first investigation of fairness in tabular ICL, evaluating three recently proposed foundation models—TabPFNv2, TabICL, and TabDPT—on multiple benchmark datasets. To mitigate biases, we explore three pre-processing fairness-enhancing methods: correlation removal (decorrelating input features from the sensitive attribute), group-balanced sample selection (ensuring equal representation of protected groups in context examples), and uncertainty-based sample selection (prioritizing context examples with high sensitive-attribute prediction uncertainty). Our experiments show that the uncertainty-based strategy consistently improves group fairness metrics (e.g., demographic parity, equalized odds, and equal opportunity) with minimal impact on predictive accuracy. We release our code to facilitate reproducibility (https://anonymous.4open.science/r/Fair-TabICL-Anonymized)

URL: https://openreview.net/forum?id=AsBhwD0sqo

---

Title: FedDUAL: A Dual-Strategy with Adaptive Loss and Dy- namic Aggregation for Mitigating Data Heterogeneity in Federated Learning

Abstract: Federated Learning (FL) marks a transformative approach to distributed model training by combining locally optimized models from various clients into a unified global model. While FL preserves data privacy by eliminating centralized storage, it encounters significant challenges such as performance degradation, slower convergence, and reduced robustness of the global model due to the heterogeneity in client data distributions. Among the various forms of data heterogeneity, label skew emerges as a particularly formidable and prevalent issue, especially in domains such as image classification. To address these challenges, we begin with comprehensive experiments to pinpoint the underlying issues in the FL training process. Based on our findings, we then introduce an innovative dual-strategy approach designed to effectively resolve these issues. First, we introduce an adaptive loss function for client-side training, meticulously crafted to preserve previously acquired knowledge while maintaining an optimal equilibrium between local optimization and global model coherence. Secondly, we develop a dynamic aggregation strategy for aggregating client models at the server. This approach adapts to each client's unique learning patterns, effectively addressing the challenges of diverse data across the network. Our comprehensive evaluation, conducted across three diverse real-world datasets, coupled with theoretical convergence guarantees, demonstrates the superior efficacy of our method compared to several established state-of-the-art approaches.

URL: https://openreview.net/forum?id=8M3XfmNhTZ

---

Title: DynFed: Dynamic Test-Time Adaptation for Federated Learning with Adaptive Rate Networks

Abstract: Test-Time Personalized Federated Learning (TTPFL) has emerged as a promising approach for adapting models to distribution shifts in federated learning (FL) environments without relying on labeled data during testing. However, existing methods often struggle with heterogeneous shifts across clients and lack the flexibility to handle diverse distribution changes effectively. In this paper, we introduce DynFed, a novel algorithm that dynamically optimizes test-time adaptation (TTA) in FL scenarios with heterogeneous distribution shifts. Our method leverages Adaptive Rate Networks (ARNs) to generate client-specific adaptation rates, enabling more effective handling of diverse shift types, including label skew and feature shifts. DynFed employs an innovative iterative adaptation process, where adaptation rates are continuously refined based on the current adaptation state using the ARN function, without direct access to raw client data. Crucially, we uncover a fundamental dichotomy: optimal adaptation strategies for one-type and multi-type distribution shifts can be diametrically opposed. DynFed navigates this challenge by automatically adjusting its approach based on the nature of the encountered shifts. Extensive experiments demonstrate that DynFed significantly outperforms existing TTPFL and TTA methods across various shift scenarios. Our theoretical analysis provides convergence and generalization guarantees for our approach and justifies the need
for adaptive mechanisms. Our method shows particularly robust performance in complex multi-type shift environments, where previous approaches often struggle. This work opens new avenues for adaptive and resilient FL in real-world applications where distribution shifts are diverse and unpredictable.

URL: https://openreview.net/forum?id=Np8Jy9kf1b

---

Title: Tukey g-and-h neural network regression for non-Gaussian data

Abstract: This paper addresses non-Gaussian regression with neural networks via the use of the Tukey g-and-h distribution. The Tukey g-and-h transform is a flexible parametric transform with two parameters $g$ and $h$ which, when applied to a standard normal random variable, introduces both skewness and kurtosis, resulting in a distribution commonly called the Tukey g-and-h distribution. Specific values of $g$ and $h$ produce good approximations to other families of distributions, such as the Cauchy and student-t distributions. The flexibility of the Tukey g-and-h distribution has driven its popularity in the statistical community, in applied sciences and finance. In this work we consider the training of a neural network to predict the parameters of a Tukey g-and-h distribution in a regression framework via the minimization of the corresponding negative log-likelihood, despite the latter having no closed-form expression. We demonstrate the efficiency of our procedure in simulated examples and apply our method to a real-world dataset of global crop yield for several types of crops. Finally, we show how we can carry out a goodness-of-fit analysis between the predicted distributions and the test data. A Pytorch implementation is made available on Github and as a Pypi package.

URL: https://openreview.net/forum?id=sjDmzLoKvj

---

Title: carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks

Abstract: Hyperparameter Optimization (HPO) is crucial to develop well-performing machine learning
models. In order to ease prototyping and benchmarking of HPO methods, we propose carps,
a benchmark framework for Comprehensive Automated Research Performance Studies
allowing to evaluate N optimizers on M benchmark tasks. In this first release of carps,
we focus on the four most important types of HPO task types: blackbox, multi-fidelity,
multi-objective and multi-fidelity-multi-objective. With 3 336 tasks from 5 community
benchmark collections and 28 variants of 9 optimizer families, we offer the biggest go-to
library to date to evaluate and compare HPO methods. The carps framework relies on a
purpose-built, lightweight interface, gluing together optimizers and benchmark tasks. It
also features an analysis pipeline, facilitating the evaluation of optimizers on benchmarks.
However, navigating a huge number of tasks while developing and comparing methods can
be computationally infeasible. To address this, we obtain a subset of representative tasks by
minimizing the star discrepancy of the subset, in the space spanned by the full set. As a
result, we propose an initial subset of 10 to 30 diverse tasks for each task type, and include
functionality to re-compute subsets as more benchmarks become available, enabling efficient
evaluations. We also establish a first set of baseline results on these tasks as a measure for
future comparisons. With carps (https://anonymous.4open.science/r/CARP-S-860C),
we make an important step in the standardization of HPO evaluation.

URL: https://openreview.net/forum?id=AuA8m4I6zI

---

Reply all
Reply to author
Forward
0 new messages