J2C Certification: Diffusion posterior sampling for simulation-based inference in tall data settings
Julia Linhart, Gabriel Cardoso, Alexandre Gramfort, Sylvain Le Corff, Pedro L. C. Rodrigues
https://openreview.net/forum?id=cdhfoS6Gyo
---
Accepted papers
===============
Title: Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment
Authors: Alif Ashrafee, Jędrzej Kozal, Michał Woźniak, Bartosz Krawczyk
Abstract: Traditional continual learning methods prioritize knowledge retention and focus primarily on mitigating catastrophic forgetting, implicitly assuming that the data distribution of previously learned tasks remains static. This overlooks the dynamic nature of real-world data streams, where concept drift permanently alters previously seen data and demands both stability and rapid adaptation. We introduce a holistic framework for continual learning under concept drift that simulates realistic scenarios by evolving task distributions. As a baseline, we consider Full Relearning (FR), in which the model is retrained from scratch on newly labeled samples from the drifted distribution. While effective, this approach incurs substantial annotation and computational overhead. To address these limitations, we propose Adaptive Memory Realignment (AMR), a lightweight alternative that equips rehearsal-based learners with a drift-aware adaptation mechanism. AMR selectively removes outdated samples of drifted classes from the replay buffer and repopulates it with a small number of up-to-date instances, effectively realigning memory with the new distribution. This targeted resampling matches the performance of FR while reducing the need for labeled data and computation by orders of magnitude. To enable reproducible evaluation, we introduce four concept drift variants of standard vision benchmarks: Fashion-MNIST-CD, CIFAR10-CD, CIFAR100-CD, and Tiny-ImageNet-CD, where previously seen classes reappear with shifted representations. Comprehensive experiments on these datasets using several rehearsal-based baselines show that AMR consistently counters concept drift, maintaining high accuracy with minimal overhead. These results position AMR as a scalable solution that reconciles stability and plasticity in non-stationary continual learning environments. Full implementation of our framework and concept drift benchmark datasets are available at: https://github.com/AlifAshrafee/CL-Under-Concept-Drift.
URL: https://openreview.net/forum?id=1drDlt0CLM
---
Title: Diffusion posterior sampling for simulation-based inference in tall data settings
Authors: Julia Linhart, Gabriel Cardoso, Alexandre Gramfort, Sylvain Le Corff, Pedro L. C. Rodrigues
Abstract: Identifying the parameters of a non-linear model that best explain observed data is a core task across scientific fields. When such models rely on complex simulators, evaluating the likelihood is typically intractable, making traditional inference methods such as MCMC inapplicable. Simulation-based inference (SBI) addresses this by training deep generative models to approximate the posterior distribution over parameters using simulated data. In this work, we consider the tall data setting, where multiple independent observations provide additional information, allowing sharper posteriors and improved parameter identifiability.
Building on the flourishing score-based diffusion literature, F-NPSE (Geffner et al., 2023) estimates the tall data posterior by composing individual scores from a neural network trained only for a single context observation. This enables more flexible and simulation-efficient inference than alternative approaches for tall datasets in SBI.
However, it relies on costly Langevin dynamics during sampling. We propose a new algorithm that eliminates the need for Langevin steps by explicitly approximating the diffusion process of the tall data posterior. Our method retains the advantages of compositional score-based inference while being significantly faster and more stable than F-NPSE. We demonstrate its improved performance on toy problems and standard SBI benchmarks, and showcase its scalability by applying it to a complex real-world model from computational neuroscience.
URL: https://openreview.net/forum?id=cdhfoS6Gyo
---
Title: A Survey on Deep Learning Approaches for Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, Diversity, and Beyond
Authors: Mihaela C. Stoian, Eleonora Giunchiglia, Thomas Lukasiewicz
Abstract: Generative modelling has become the standard approach for synthesising tabular data. However, different use cases demand synthetic data to comply with different requirements to be useful in practice. In this survey, we review deep generative modelling approaches for tabular data from the perspective of five types of requirements: utility of the synthetic data, alignment of the synthetic data with domain-specific knowledge, statistical fidelity of the synthetic data distribution compared to the real data distribution, privacy-preserving capabilities, and sampling diversity. We group the approaches along two levels of granularity: (i) based on the requirements they address and (ii) according to the underlying model they utilise. Additionally, we summarise the appropriate evaluation methods for each requirement, the relationships among the requirements, and the specific characteristics of each model type. Finally, we discuss future directions for the field, along with opportunities to improve the current evaluation methods. Overall, this survey can be seen as a user guide to tabular data generation: helping readers navigate available models and evaluation methods to find those best suited to their needs.
URL: https://openreview.net/forum?id=RoShSRQQ67
---
Title: Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning
Authors: Lama Alssum, Hasan Abed Al Kader Hammoud, Motasem Alfarra, Juan C Leon Alcazar, Bernard Ghanem
Abstract: Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task. This issue arises due to the model’s tendency to overwrite previously acquired knowledge with new information. We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches. We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods, which is based exclusively on the expected label distribution, thus making it class-agnostic. As a consequence, IM regularizer can be directly integrated into various rehearsal-based continual learning methods, reducing forgetting and favoring faster convergence. Our empirical validation shows that, across datasets and regardless of the number of tasks, our proposed regularization strategy consistently improves baseline performance at the expense of a minimal computational overhead. The lightweight nature of IM ensures that it remains a practical and scalable solution, making it applicable to real-world continual learning scenarios where efficiency is paramount. Finally, we demonstrate the data-agnostic nature of our regularizer by applying it to video data, which presents additional challenges due to its temporal structure and higher memory requirements. Despite the significant domain gap, our experiments show that IM regularizer also improves the performance of video continual learning methods.
URL: https://openreview.net/forum?id=CJw1ZjkJMG
---
New submissions
===============
Title: Robust Cross-Domain Alignment
Abstract: The Gromov-Wasserstein (GW) distance is an effective measure of alignment between distributions supported on distinct ambient spaces. Calculating essentially the mutual departure from isometry, it has found vast usage in domain translation and network analysis. It has long been shown to be vulnerable to contamination in the underlying measures. All efforts to introduce robustness in GW have been inspired by similar optimal transport (OT) techniques, which predominantly advocate partial mass transport or unbalancing. In contrast, the cross-domain alignment problem, being fundamentally different from OT, demands specific solutions to tackle diverse applications and contamination regimes. Deriving from robust statistics, we discuss three contextually novel techniques to robustify GW and its variants. For each method, we explore metric properties and robustness guarantees along with their co-dependencies and individual relations with the GW distance. For a comprehensive view, we empirically validate their superior resilience to contamination under real machine learning tasks against state-of-the-art methods.
URL: https://openreview.net/forum?id=0mchjaZZi4
---
Title: CE-LoRA: Computation-Efficient LoRA Fine-Tuning for Large Language Models
Abstract: Large Language Models (LLMs) demonstrate exceptional performance across various tasks but demand substantial computational resources even for fine-tuning computation. Although Low-Rank Adaptation (LoRA) significantly alleviates memory consumption during fine-tuning, its impact on computational cost reduction is limited. This paper identifies the computation of activation gradients as the primary bottleneck in LoRA's backward propagation and introduces the Computation-Efficient LoRA (CE-LoRA) algorithm, which enhances computational efficiency while preserving memory efficiency. CE-LoRA leverages two key techniques: Approximated Matrix Multiplication, which replaces dense multiplications of large and complete matrices with sparse multiplications involving only critical rows and columns, and the Double-LoRA technique, which reduces error propagation in activation gradients. Theoretically, CE-LoRA converges at the same rate as LoRA,$\mathcal{O}(1/\sqrt{T})$, where $T$ is the number of iterations. Empirical evaluations confirm that CE-LoRA significantly reduces computational costs compared to LoRA without notable performance degradation.
URL: https://openreview.net/forum?id=kwE16U73HH
---
Title: From Representation to Causation: A Three-Tier Framework and Open-Source Benchmark for Mechanistic Interpretability
Abstract: Interpretability research often conflates whether information is merely encoded within a model or whether it causally drives behavior. We introduce MechInterp3, a failure-aware framework that disentangles these properties into a three-tier hierarchy: (Tier-1) linear
encoding, (Tier-2) probe accessibility, and (Tier-3) causal responsibility. By applying this framework to six transformer architectures across four tasks, we reveal that standard causal interventions "silently fail” in approximately 50% of model-task combinations due to weak
behavioral contrast. This produces mathematically ill-conditioned estimates that undermine causal claims. Our systematic evaluation reveals three critical findings. First, we identify a pervasive tier dissociation where models with near-perfect probe accuracy often show zero or negative causal recovery, most notably in GPT-2 sentiment processing (−0.34 recovery). Second, we demonstrate that observational methods, such as attention weights and gradient attribution, are uninformative of causal structure, showing near-zero correlation ($\rho$ < 0.1) with intervention effects. Third, we discover that tasks requiring relational reasoning, such as NLI, induce more stable and localized causal circuits than surface-level tasks, despite having weaker linear representations. We release MechInterp3 as an open-source library to establish a rigorous statistical foundation for the study of machine intelligence.
URL: https://openreview.net/forum?id=thmHvIG4Xv
---
Title: Discrete Diffusion in Large Language and Multimodal Models: A Survey
Abstract: In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output control, and dynamic perception. These capabilities are previously difficult to achieve with AR models. A growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10$\times$ acceleration in inference speed. These developments position discrete diffusion models as a promising alternative to intelligence based on the traditional autoregressive approach. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, list commonly-used modeling methods, and categorize representative models. We further analyze key techniques for training, inference, quantization. We also discuss the trustworthy issues and summarize emerging applications across language, vision-language, and biological domains and etc.. We conclude by discussing future directions for research and deployment.
URL: https://openreview.net/forum?id=0DsqnkP8Cp
---
Title: SAPIENT: Continual Test-time Adaptation via Lightweight plug-and-play Adapters
Abstract: Continual test-time adaptation (TTA) is the problem of adapting a pre-trained source model at inference-time to handle test samples from a non-stationary distribution, while not forgetting the knowledge acquired from earlier domains. Existing continual TTA methods either make unsupervised test-time updates to the entire model, which can be expensive and prone to forgetting, or do so by keeping the base model frozen and adding a small number of learnable adapter modules for better time/memory efficiency and mitigating forgetting. We present SAPIENT (continual teSt-time adaPtation vIa lightwEight plug-aNd-play adapTers), a parameter-efficient adapter based approach which not only offers the usual benefits of the adapter based continual TTA methods, but offers additional key benefits, such as (1) its simple plug-and-play design seamlessly integrates with various continual TTA losses, making our approach complementary to existing continual TTA methods, improving their time/memory efficiency and knowledge retention, (2) it does not require access to the source domain data unlike recent adapter based continual TTA methods, and (3) its parameter-efficiency also makes it computationally feasible to design its Bayesian extensions which can help in estimating the uncertainty in adapter weights, which in turn yields more robust predictions. Through extensive experiments on a segmentation task and four classification tasks for continual TTA, we demonstrate that, with substantially ($\sim$90\%) fewer trainable parameters, our method achieves better/similar performance compared to existing SOTA continual TTA methods, resulting in efficient and robust adaptation and inference at test-time.
URL: https://openreview.net/forum?id=zhS2NbPR7q
---
Title: Exploration-Driven Optimization for Test-Time Large Language Model Reasoning
Abstract: Post-training techniques combined with inference-time scaling significantly enhance the reasoning and alignment capabilities of large language models (LLMs). However, a fundamental tension arises: inference-time methods benefit from diverse sampling from a relatively flattened probability distribution, whereas reinforcement learning (RL)-based post-training inherently sharpens these distributions. To address this, we propose Exploration-Driven Optimization (EDO) that integrates reward-biasing into standard RL objectives, encouraging greater diversity in sampled solutions while facilitating a more effective inference-time computation. We seamlessly incorporate EDO into established RL frameworks, specifically iterative Direct Preference Optimization (iDPO) and Group Relative Policy Optimization (GRPO), resulting in two variants: ED-iDPO and ED-GRPO. Extensive experiments demonstrate that both ED-iDPO and ED-GRPO exhibit greater solution diversity and improved reasoning abilities, particularly when combined with test-time computation techniques like self-consistency. Across three in-distribution reasoning benchmarks, EDO achieves a 1.0-1.3% improvement over the strongest baselines, and delivers an additional 1.5% average gain on five out-of-distribution tasks. Beyond accuracy, EDO preserves model entropy and stabilizes RL training dynamics, highlighting its effectiveness in preventing over-optimization collapse. Taken together, these results establish EDO as a principled framework for balancing exploration and exploitation in LLM reasoning and advancing the state of the art.
URL: https://openreview.net/forum?id=NiINDlzvNj
---
Title: GAMformer: Bridging Tabular Foundation Models and Interpretable Machine Learning
Abstract: While interpretability is crucial for machine learning applications in safety-critical domains and for regulatory compliance, existing tabular foundation models like TabPFN lack transparency. Generalized Additive Models (GAMs) provide the needed interpretability through their additive structure, but traditional GAM methods rely on iterative learning algorithms (such as splines, boosted trees, or neural networks) that are fundamentally incompatible with the in-context learning paradigm of foundation models. In this paper, we introduce GAMformer, the first tabular foundation model for GAMs that bridges the gap between the power of foundation models and the interpretability requirements of critical real-world applications. GAMformer estimates GAM shape functions in a single forward pass using in-context learning, representing a significant departure from conventional iterative approaches. Building on previous research on tabular foundation models, we train GAMformer exclusively on synthetically generated tables to prevent data leakage. Our experiments demonstrate that GAMformer performs comparably to other leading GAMs across various classification benchmarks.
URL: https://openreview.net/forum?id=647gba3osV
---
Title: Efficient Adaptation of Large Vision-Language Models: Transfer Learning Methods and Applications
Abstract: Pre-trained large vision-language models (VLMs) have become the dominant choice for handling vision-language tasks, covering from multimodal reasoning to text-image generation. However, these models heavily depend on large-scale training datasets, primarily composed of image-text pairs sourced from web data, which are typically confined to general domains rather than specific downstream tasks. Given the scarcity of data in such specialized domains, transfer learning emerges as a remedy, enabling the adaptation of a model's preexisting knowledge to new tasks with limited data, thereby mitigating the reliance on extensive datasets. Following the current trend of the transfer learning application with vision-language tasks, we provide a systematic study of existing transfer learning techniques adopted for vision-language models, including: (1) a summary of the existing state-of-the-art VLMs, (2) a comprehensive taxonomy of transfer learning approaches for VLMs, (3) the discussion of real-world applications of transfer learning methods for VLMs, (4) a summary of commonly used vision-language dataset and benchmarks in variant vision-language tasks.
URL: https://openreview.net/forum?id=Xu9Oq5RwCa
---
Title: Coherence–Diffusion Dynamics: A Continuous-Semantic Interpretation of Transformer Language Models
Abstract: Large language models (LLMs) exhibit coherent reasoning, long-range contextual integration, and abrupt failures such as hallucination, yet the internal principles governing these behaviors remain poorly understood. Existing interpretability approaches typically focus
on isolated components, including attention patterns, neuron circuits, or probing signals, and therefore provide limited insight into how semantic meaning evolves over the course of inference. This work proposes that Transformer-based language models can be productively interpreted through a continuous semantic perspective, in which internal representations evolve along structured trajectories in a latent space. We articulate this interpretation through the Coherence–Diffusion Dynamics (CDD) framework, which models semantic evolution as the interaction of coherence-restoring tendencies and stochastic variability. Within this framework, we introduce an effective instability potential serving as an interpretive proxy for semantic coherence, a coherence operator governing stabilizing dynamics, a diffusion term capturing stochastic variability, and an interpretation of dynamic sparsity capturing the apparent contraction of effective semantic degrees of freedom along inference trajectories. These constructs suggest qualitative, empirically testable implications regarding stabilization, regime shifts associated with hallucination, and the functional irrelevance of low-impact components. We evaluate these implications through controlled experiments on Transformer language models, showing broad alignment between observed behavior and
the qualitative predictions of the CDD interpretation. Taken together, this work provides a coherent and dynamically grounded account of semantic evolution in LLMs, providing a principled lens for interpreting coherence, variability, sparsity, and instability without departing from the discrete computational structure of Transformer architectures.
URL: https://openreview.net/forum?id=Q3cbXoUgFF
---