Featured Certification: Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?
Viet-Hung Tran, Ngoc-Bao Nguyen, Son T. Mai, Hans Vandierendonck, Ira Assent, Alex Kot, Ngai-Man Cheung
https://openreview.net/forum?id=S9CwKnPHaO
---
Accepted papers
===============
Title: Emergent Symbol-like Number Variables in Artificial Neural Networks
Authors: Satchel Grant, Noah Goodman, James Lloyd McClelland
Abstract: What types of numeric representations emerge in neural systems, and what would a satisfying answer to this question look like? In this work, we interpret Neural Network (NN) solutions to sequence based number tasks through a variety of methods to understand how well we can interpret them through the lens of interpretable Symbolic Algorithms (SAs)—precise algorithms describable by rules operating on typed, mutable variables. We use GRUs, LSTMs, and Transformers trained using Next Token Prediction (NTP) on tasks where the correct tokens depend on numeric information only latent in the task structure. We show through multiple causal and theoretical methods that we can interpret raw NN activity through the lens of simplified SAs when we frame the neural activity in terms of neural subspaces rather than individual neurons. Using Distributed Alignment Search (DAS), we find that, depending on network architecture, dimensionality, and task specifications, alignments with SA’s can be very high, while other times they can be only approximate, or fail altogether. We extend our analytic toolkit to address the failure cases by expanding the DAS framework to a broader class of alignment functions that more flexibly capture NN activity in terms of interpretable variables from SAs, and we provide theoretic and empirical explorations of Linear Alignment Functions (LAFs) in contrast to the preexisting Orthogonal Alignment Functions (OAFs). Through analyses of specific cases we confirm the usefulness of causal interventions on neural subspaces for NN interpretability, and we show that recurrent models can develop graded, symbol-like number variables within their neural activity. We further show that shallow Transformers learn very different solutions than recurrent networks, and we prove that such models must use anti-Markovian solutions—solutions that do not rely on cumulative, Markovian hidden states—in the absence of sufficient attention layers.
URL: https://openreview.net/forum?id=YPnYpiru5W
---
Title: Loss Landscape Degeneracy and Stagewise Development in Transformers
Authors: Jesse Hoogland, George Wang, Matthew Farrugia-Roberts, Liam Carroll, Susan Wei, Daniel Murfet
Abstract: Deep learning involves navigating a high-dimensional loss landscape over the neural network parameter space. Over the course of training, complex computational structures form and re-form inside the neural network, leading to shifts in input/output behavior. It is a priority for the science of deep learning to uncover principles governing the development of neural network structure and behavior. Drawing on the framework of singular learning theory, we propose that model development is deeply linked to degeneracy in the local geometry of the loss landscape. We investigate this link by monitoring loss landscape degeneracy throughout training, as quantified by the local learning coefficient, for a transformer language model and an in-context linear regression transformer. We show that training can be divided into distinct periods of change in loss landscape degeneracy, and that these changes in degeneracy coincide with significant changes in the internal computational structure and the input/output behavior of the transformers. This finding provides suggestive evidence that degeneracy and development are linked in transformers, underscoring the potential of a degeneracy-based perspective for understanding modern deep learning.
URL: https://openreview.net/forum?id=45qJyBG8Oj
---
Title: Does equivariance matter at scale?
Authors: Johann Brehmer, Sönke Behrends, Pim De Haan, Taco Cohen
Abstract: Given large datasets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.
URL: https://openreview.net/forum?id=wilNute8Tn
---
Title: Continuous Parallel Relaxation for Finding Diverse Solutions in Combinatorial Optimization Problems
Authors: Yuma Ichikawa, Hiroaki Iwashita
Abstract: Finding the optimal solution is often the primary goal in combinatorial optimization (CO). However, real-world applications frequently require diverse solutions rather than a single optimum, particularly in two key scenarios. The first scenario occurs in real-world applications where strictly enforcing every constraint is neither necessary nor desirable. Allowing minor constraint violations can often lead to more cost-effective solutions. This is typically achieved by incorporating the constraints as penalty terms in the objective function, which requires careful tuning of penalty parameters. The second scenario involves cases where CO formulations tend to oversimplify complex real-world factors, such as domain knowledge, implicit trade-offs, or ethical considerations. To address these challenges, generating (i) penalty-diversified solutions by varying penalty intensities and (ii) variation-diversified solutions with distinct structural characteristics provides valuable insights, enabling practitioners to post-select the most suitable solution for their specific needs. However, efficiently discovering these diverse solutions is more challenging than finding a single optimal one. This study introduces Continual Parallel Relaxation Annealing (CPRA), a computationally efficient framework for unsupervised-learning (UL)-based CO solvers that generates diverse solutions within a single training run. CPRA leverages representation learning and parallelization to automatically discover shared representations, substantially accelerating the search for these diverse solutions. Numerical experiments demonstrate that CPRA outperforms existing UL-based solvers in generating these diverse solutions while significantly reducing computational costs.
URL: https://openreview.net/forum?id=ix33zd5zCw
---
Title: Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
Authors: Olawale Elijah Salaudeen, Nicole Chiou, Shiny Weng, Sanmi Koyejo
Abstract: Spurious correlations, unstable statistical shortcuts a model can exploit, are expected to degrade performance out-of-distribution (OOD). However, across many popular OOD generalization benchmarks, vanilla empirical risk minimization (ERM) often achieves the highest OOD accuracy. Moreover, gains in in-distribution accuracy generally improve OOD accuracy, a phenomenon termed accuracy on the line, which contradicts the expected harm of spurious correlations. We show that these observations are an artifact of misspecified OOD datasets that do not include shifts in spurious correlations that harm OOD generalization, the setting they are meant to evaluate. Consequently, current practice evaluates "robustness" without truly stressing the spurious signals we seek to eliminate; our work pinpoints when that happens and how to fix it. Contributions. (i) We derive necessary and sufficient conditions for a distribution shift to reveal a model's reliance on spurious features; when these conditions hold, "accuracy on the line" disappears. (ii) We audit leading OOD datasets and find that most still display accuracy on the line, suggesting they are misspecified for evaluating robustness to spurious correlations. (iii) We catalog the few well-specified datasets and summarize generalizable design principles, such as identifying datasets of natural interventions (e.g., a pandemic), to guide future well-specified benchmarks.
URL: https://openreview.net/forum?id=fNywRyqPQo
---
Title: BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
Authors: Peijia Qin, Ruiyi Zhang, Pengtao Xie
Abstract: Parameter-efficient fine-tuning (PEFT) is a flexible and efficient method for adapting large language models (LLMs) to downstream tasks. Among these methods, weight-decomposed low-rank adaptation (DoRA) is a promising approach that decomposes weight matrices into magnitude and direction components to mimic full fine-tuning (FT) better. However, DoRA's simultaneous optimization of these components makes it over-expressive, increases the risk of overfitting, and creates a coupled updating pattern that limits its learning capacity. To address these issues, we propose Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation (BiDoRA), a novel PEFT method based on a bi-level optimization framework. BiDoRA fundamentally differs from DoRA by optimizing the magnitude and direction in two separate, asynchronous loops using distinct training and validation data splits. This decoupled optimization process effectively mitigates overfitting and allows for more flexible updates that align even more closely with FT. For instance, weight decomposition analysis shows BiDoRA achieves a magnitude-direction update correlation of $-8.042$, significantly closer to the FT ideal compared to $-1.784$ for DoRA. Evaluation of BiDoRA on diverse tasks spanning natural language understanding, generation, token classification, and extremely small biomedical datasets reveals that it consistently outperforms DoRA and a wide range of leading PEFT methods. This improvement is statistically significant, as demonstrated on the GLUE benchmark where BiDoRA surpasses DoRA with a p-value of $2.4\times10^{-4}$ in terms of the Wilcoxon signed-rank test. The code for BiDoRA is available at https://github.com/t2ance/BiDoRA.
URL: https://openreview.net/forum?id=v2xCm3VYl4
---
Title: Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Authors: Edan kinderman, Itay Hubara, Haggai Maron, Daniel Soudry
Abstract: Recent methods aim to merge neural networks (NNs) with identical architectures trained on different tasks into a single multi-task model. While most works focus on the simpler setup of merging NNs initialized from a common pre-trained network, we target the harder problem of merging large transformers trained on different tasks from distinct initializations. We show that traditional merging methods fail catastrophically in this setup, while Knowledge Distillation (KD) achieves much better results, though at a higher cost. However, KD is data-inefficient, as it does not exploit the original models' weights. To solve this, we introduce "Foldable SuperNet Merge" (FS-Merge), which trains a SuperNet containing the original models (with frozen weights) using a feature reconstruction objective. After training, the SuperNet is folded back to the size of a single original model. FS-Merge is simple, data-efficient, has a computational cost comparable to KD, and is proven to have superior expressiveness over traditional merging methods. It achieves SOTA results when tested on MLPs and transformers across various sizes, tasks, modalities, and distribution shifts, especially in low-data scenarios.
URL: https://openreview.net/forum?id=6FqwLestHv
---
Title: Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?
Authors: Viet-Hung Tran, Ngoc-Bao Nguyen, Son T. Mai, Hans Vandierendonck, Ira Assent, Alex Kot, Ngai-Man Cheung
Abstract: Model Inversion (MI) attacks pose a significant privacy threat by reconstructing private
training data from machine learning models. While existing defenses primarily concentrate on model-centric approaches, the impact of data on MI robustness remains largely unexplored. In this work, we explore Random Erasing (RE)—a technique traditionally used for improving model generalization under occlusion—and uncover its surprising effectiveness as a defense against MI attacks.
Specifically, our novel feature space analysis shows that model trained with RE-images introduces a significant discrepancy between the features of MI-reconstructed images and those of the private data. At the same time, features of private images remain distinct from other
classes and well-separated from different classification regions. These effects collectively de grade MI reconstruction quality and attack accuracy while maintaining reasonable natural accuracy. Furthermore, we explore two critical properties of RE including Partial Erasure and Random Location. First, Partial Erasure prevents the model from observing entire objects during training, and we find that this has significant impact on MI, which aims to reconstruct the entire objects. Second, the Random Location of erasure plays a crucial role in achieving a strong privacy-utility trade-off. Our findings highlight RE as a simple yet effective defense mechanism that can be easily integrated with existing privacy-preserving techniques. Extensive experiments of 37 setups demonstrate that our method achieves SOTA performance in privacy-utility tradeoff. The results consistently demonstrate the superiority of our defense over existing defenses across different MI attacks, network architectures, and attack configurations. For the first time, we achieve significant degrade in attack accuracy without decrease in utility for some configurations. Our code and additional results are available at: https://ngoc-nguyen-0.github.io/MIDRE/
URL: https://openreview.net/forum?id=S9CwKnPHaO
---
Title: Classifier-Free Guidance is a Predictor-Corrector
Authors: Arwen Bradley, Preetum Nakkiran
Abstract: We investigate the theoretical foundations of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we first show that CFG interacts differently with DDPM (Ho et al., 2020) and DDIM (Song et al., 2021), and neither sampler with CFG generates the gamma-powered distribution $p(x|c)^\gamma p(x)^{1-\gamma}$. Then, we clarify the behavior of CFG by showing that it is a kind of predictor-corrector method (Song et al., 2020) that alternates between denoising and sharpening, which we call predictor-corrector guidance (PCG). We prove that in the SDE limit, CFG is actually equivalent to combining a DDIM predictor for the conditional distribution together with a Langevin dynamics corrector for a gamma-powered distribution (with a carefully chosen gamma). Our work thus provides a lens to theoretically understand CFG by embedding it in a broader design space of principled sampling methods.
URL: https://openreview.net/forum?id=zrWNtzSZsf
---
New submissions
===============
Title: Towards Online Multimodal Social Interaction Understanding
Abstract: In this paper, we introduce a new problem, Online-MMSI, where the model must perform multimodal social interaction understanding (MMSI) using only historical information. Given a recorded video and a multi-party dialogue, the AI assistant is required to immediately identify the speaker’s referent, which is critical for real-world human-AI interaction. Without access to future conversational context, both humans and models experience substantial performance degradation when moving from offline to online settings.
\
To tackle the challenges, we propose Online-MMSI-VLM, a novel framework based on multimodal large language models. The core innovations of our approach lie in two components: (1) multi-party conversation forecasting, which predicts upcoming speaker turns and utterances in a coarse-to-fine manner; and (2) socially-aware visual prompting, which highlights salient social cues in each video frame using bounding boxes and body keypoints.
\
Our model achieves state-of-the-art results on three tasks across two datasets, significantly outperforming the baseline and demonstrating the effectiveness of Online-MMSI-VLM.
URL: https://openreview.net/forum?id=5P7yVfUEuD
---
Title: Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization
Abstract: Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i) preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to the changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and sufficiently accurate approximation to the uncertainty around the state-value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by the change in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emph{Evidential Proximal Policy Optimization (EPPO)} due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.
URL: https://openreview.net/forum?id=KTfTwxsVNE
---
Title: Nonlinear reconciliation: Error reduction theorems
Abstract: Forecast reconciliation, an ex-post technique applied to forecasts that must satisfy constraints, has been a prominent topic in the forecasting literature over the past two decades. Recently, several efforts have sought to extend reconciliation methods to the probabilistic settings. Nevertheless, formal theorems demonstrating error reduction in nonlinear contexts, analogous to those presented in Panagiotelis et al., (2021), are still lacking. This paper addresses that gap by establishing such theorems for various classes of nonlinear hypersurfaces and vector-valued functions. Specifically, we derive an exact analog of Theorem 3.1 from Panagiotelis et al., (2021) for hypersurfaces with constant-sign curvature. Additionally, we provide an error reduction theorem for the broader case of hypersurfaces with non-constant-sign curvature and for general manifolds with codimension > 1. To support reproducibility and practical adoption, we release a JAX-based Python package, \emph{to be released upon publication}, implementing the presented theorems and reconciliation procedures.
URL: https://openreview.net/forum?id=dXRWuogm3J
---
Title: Interpretability of Language Models for Learning Hierarchical Structures
Abstract: Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. Previous research has primarily explored how these models handle simple tasks like name copying or selection; we extend this by investigating how they process complex, recursive language structures defined by context-free grammars (CFGs). We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy, locally ambiguous sequences that require dynamic programming to parse. Despite this complexity, we show that generative models like GPT can learn these CFG languages and generate valid completions. Analyzing the model's internals, we find that its hidden states linearly encode parse tree structure (via our new probing technique), and attention patterns statistically align with the information flow of dynamic programming-style parsing algorithms. These provide a controlled interpretability setting for understanding how transformers may represent and compute over hierarchical syntax.
URL: https://openreview.net/forum?id=mPQKyzkA1K
---
Title: Trading-off Statistical and Computational Efficiency via $W$-step MDPs: A Policy Gradient Approach
Abstract: In reinforcement learning, algorithm performance is typically evaluated along two dimensions: computational and statistical complexity. While theoretical researchers often prioritize statistical efficiency - minimizing the number of samples needed to reach a desired accuracy - practitioners focus on reducing computational costs, such as training time and resource consumption. Closing this gap requires algorithms that balance both aspects effectively. In this paper, we introduce MetaStep, a meta-algorithm designed to enhance state-of-the-art RL algorithms by improving their computational performance while maintaining competitive sample efficiency. MetaStep is based on the novel notion of $W$-step Markov decision process (MDP), where, instead of performing a single action and transitioning to the next state, the agent executes a sequence of $W$ actions before observing the resulting state and collecting the discounted $W$-step cumulative reward. First, we provide a theoretical analysis of the suboptimality introduced in the optimal policy performance when planning in a $W$-step MDP, highlighting the impact of the environment stochasticity. Second, we apply MetaStep to GPOMDP, a well-known policy gradient method, and theoretically investigate the advantages of learning in the $W$-step MDP in terms of variance reduction and improved sample complexity. Finally, empirical evaluations confirm that MetaStep reduces computational costs while preserving - and, in certain scenarios, improving - sample efficiency.
URL: https://openreview.net/forum?id=9ynBcLjt7p
---
Title: ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
Abstract: We present ThinkPrune, a simple yet effective method for pruning the thinking length for long-thinking LLMs, which has been found to often produce inefficient and redundant thinking processes. Existing preliminary explorations of reducing thinking length primarily focus on forcing the thinking process to early exit, rather than adapting the LLM to optimize and consolidate the thinking process, and therefore the length-performance tradeoff observed so far is sub-optimal. To fill this gap, ThinkPrune offers a simple solution that continuously trains the long-thinking LLMs via reinforcement learning (RL) with an added token limit, beyond which any unfinished thoughts and answers will be discarded, resulting in a zero reward. To further preserve model performance, we introduce an iterative length pruning approach, where multiple rounds of RL are conducted, each with an increasingly more stringent token limit. We observed that ThinkPrune results in a remarkable performance-length tradeoff on the AIME24 dataset, the reasoning length of DeepSeek-R1-Distill-Qwen-1.5B can be reduced by half with only 2% drop in performance. We also observed that after pruning, the LLMs can bypass unnecessary steps while keeping the core reasoning process complete.
URL: https://openreview.net/forum?id=V51gPu1uQD
---
Title: A Survey on Federated Fine-Tuning of Large Language Models
Abstract: Large Language Models (LLMs) have demonstrated impressive success across various tasks. Integrating LLMs with Federated Learning (FL), a paradigm known as FedLLM, offers a promising avenue for collaborative model adaptation while preserving data privacy. This survey provides a systematic and comprehensive review of FedLLM. We begin by tracing the historical development of both LLMs and FL, summarizing relevant prior research to set the context. Subsequently, we delve into an in-depth analysis of the fundamental challenges inherent in deploying FedLLM. Addressing these challenges often requires efficient adaptation strategies; therefore, we conduct an extensive examination of existing Parameter-Efficient Fine-tuning (PEFT) methods and explore their applicability within the FL framework. To rigorously evaluate the performance of FedLLM, we undertake a thorough review of existing fine-tuning datasets and evaluation benchmarks. Furthermore, we discuss FedLLM's diverse real-world applications across multiple domains. Finally, we identify critical open challenges and outline promising research directions to foster future advancements in FedLLM. This survey aims to serve as a foundational resource for researchers and practitioners, offering valuable insights into the rapidly evolving landscape of federated fine-tuning for LLMs. It also establishes a roadmap for future innovations in privacy-preserving AI. We actively maintain a GitHub repo to track cutting-edge advancements in this field.
URL: https://openreview.net/forum?id=rnCqbuIWnn
---