Daily TMLR digest for Dec 10, 2025

1 view

Skip to first unread message

TMLR

unread,

Dec 10, 2025, 12:30:08 AM12/10/25

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation

Authors: Justus Westerhoff, Golzar Atefi, Mario Koddenbrock, Alexei Figueroa, Alexander Löser, Erik Rodner, Felix Alexander Gers

Abstract: The capacity of foundation models allows for their application to new, unseen tasks. The adaptation to such tasks is called transfer learning. An efficient transfer learning method that circumvents parameter optimization is imprinting. The conceptual differences between studies on imprinting form the basis for our systematic investigation. In this work, we propose the general $\texttt{IMPRINT}$ framework, identifying three main components: generation, normalization, and aggregation. Through the lens of this framework, we conduct an in-depth analysis and a comparison of the existing methods. Our findings reveal the benefits of representing novel data with multiple proxies in the generation step and show the importance of proper normalization. Beyond an extensive analytical grounding, our framework enables us to propose a novel variant of imprinting which outperforms previous work on transfer learning tasks by $4\%$. This variant determines proxies through clustering motivated by the neural collapse phenomenon -- a connection that we draw for the first time. We publicly release our code at \url{https://github.com/DATEXIS/IMPRINT}.

URL: https://openreview.net/forum?id=duU11BnQ3Y

---

Title: Sparse, Efficient and Explainable Data Attribution with DualXDA

Authors: Moritz Weckbecker, Galip Ümit Yolcu, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin

Abstract: Data Attribution (DA) is an emerging approach in the field of eXplainable Artificial Intelligence (XAI), aiming to identify influential training datapoints which determine model outputs. It seeks to provide transparency about the model and individual predictions, e.g. for model debugging, identifying data-related causes of suboptimal performance. However, existing DA approaches suffer from prohibitively high computational costs and memory demands when applied to even medium-scale datasets and models, forcing practitioners to resort to approximations that may fail to capture the true inference process of the underlying model. Additionally, current attribution methods exhibit low sparsity, resulting in non-negligible attribution scores across a high number of training examples, hindering the discovery of decisive patterns in the data. In this work, we introduce DualXDA, a framework for sparse, efficient and explainable DA, comprised of two interlinked approaches, Dual Data Attribution (DualDA) and eXplainable Data Attribution (XDA): With DualDA, we propose a novel approach for efficient and effective DA, leveraging Support Vector Machine theory to provide fast and naturally sparse data attributions for AI predictions. In extensive quantitative analyses, we demonstrate that DualDA achieves high attribution quality, excels at solving a series of evaluated downstream tasks, while at the same time improving explanation time by a factor of up to 4,100,000 x compared to the original Influence Functions method, and up to 11,000 x compared to the method's most efficient approximation from literature to date. We further introduce XDA, a method for enhancing Data Attribution with capabilities from feature attribution methods to explain why training samples are relevant for the prediction of a test sample in terms of impactful features, which we showcase and verify qualitatively in detail. Taken together, our contributions in DualXDA ultimately point towards a future of eXplainable AI applied at unprecedented scale, enabling transparent, efficient and novel analysis of even the largest neural architectures -- such as Large Language Models -- and fostering a new generation of interpretable and accountable AI systems. The implementation of our methods, as well as the full experimental protocol, is available on github.

URL: https://openreview.net/forum?id=qfx81N884A

---

Title: Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images

Authors: George R. Nahass, Zhu Wang, Homa Rashidisabet, Won Hwa Kim, Sasha Hubschman, Jeffrey C. Peterson, Pete Setabutr, Chad A. Purnell, Ann Tran, Darvin Yi, Sathya N. Ravi

Abstract: Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes are common. To this end, we propose a bilevel optimization formulation of boundary-based unlearning that can be solved using iterative algorithms. We provide convergence guarantees when first order algorithms are used to unlearn and introduce a tunable loss design for controlling the forgetting–retention tradeoff. Across benchmark and real-world clinical imaging datasets, our approach outperforms baselines on both forgetting and retention metrics, including scenarios involving imaging devices and anatomical outliers. This work demonstrates the feasibility of unlearning on clinical imaging datasets and proposes it as a tool for model maintenance in scenarios that require removing the influence of specific data points without full model retraining. Code is available $\href{https://github.com/monkeygobah/unlearning_langevin}{here}$.

URL: https://openreview.net/forum?id=XE0bJg6sQN

---

Title: How iteration composition influences convergence and stability in deep learning

Authors: Benoit Dherin, Benny Avelin, Anders Karlsson, Hanna Mazzawi, Javier Gonzalvo, Michael Munn

Abstract: Despite exceptional achievements, training neural networks remains computationally expen- sive and is often plagued by instabilities that can degrade convergence. While learning rate schedules can help mitigate these issues, finding optimal schedules is time-consuming and resource-intensive. This work explores theoretical issues concerning training stability in the constant-learning-rate (i.e., without schedule) and small-batch-size regime. Surprisingly, we show that the composition order of gradient updates affects stability and convergence in gradient-based optimizers. We illustrate this new line of thinking using backward-SGD, which produces parameter iterates at each step by reverting the usual forward composition order of batch gradients. Our theoretical analysis shows that in contractive regions (e.g., around minima) backward-SGD converges to a point while the standard forward-SGD generally only converges to a distribution. This leads to improved stability and convergence which we demonstrate experimentally. While full backward-SGD is computationally intensive in practice, it highlights that the extra freedom of modifying the usual iteration composition by reusing creatively previous batches at each optimization step may have important beneficial effects in improving training. To our knowledge, this represents a new and unexplored avenue in deep learning optimization.

URL: https://openreview.net/forum?id=GZCBM2Yo3a

---

Title: Prompt Engineering Techniques for Language Model Reasoning Lack Replicability

Authors: Laurène Vaugrante, Mathias Niepert, Thilo Hagendorff

Abstract: As large language models (LLMs) are integrated into everyday applications, research into prompt engineering techniques (PET) to improve these models’ behavior has surged. How- ever, clear methodological guidelines for evaluating these techniques are lacking. This raises concerns about the replicability and generalizability of the prompt engineering techniques’ benefits. We support our concerns with a series of replication experiments focused on zero- shot prompt engineering techniques purported to influence reasoning abilities in LLMs. We tested GPT-3.5, GPT-4o, Gemini 1.5 Pro, Claude 3 Opus, Llama 3, Vicuna, and BLOOM on the chain-of-thought, EmotionPrompting, Sandbagging, Re-Reading, Rephrase- and-Respond (RaR), and ExpertPrompting prompt engineering techniques. We applied them on manually double-checked subsets of reasoning benchmarks including Common- senseQA, CRT, NumGLUE, ScienceQA, and StrategyQA. Our findings reveal a general lack of statistically significant differences across nearly all techniques tested, highlighting, among others, several methodological weaknesses in previous research. To counter these issues, we propose recommendations for establishing sound benchmarks, and designing rigorous exper- imental frameworks to ensure accurate and reliable assessments of model outputs.

URL: https://openreview.net/forum?id=bgjR5bM44u

---

Title: SPONGE: Competing Sparse Language Representations for Effective Knowledge Transfer

Authors: Jens-Michalis Papaioannou, Alexei Figueroa, Conor Fallon, Anna Capilla, Alexandra Bekiaridou, Stavros Zanos, Wolfgang Nejdl, Alexander Löser

Abstract: In domains with privacy constraints, most knowledge resides in siloed datasets, hindering the development of a model with all relevant knowledge for a task.
Clinical NLP is a prime example of these constraints in practice.
Research in this area typically falls back to the canonical setting of sequential transfer learning, where a model pre-trained on large corpora is finetuned on a smaller annotated dataset.
An avenue for knowledge transfer among diverse clinics is multi-step sequential transfer learning since models are more likely to be shared than private clinical data.
This setting poses challenges of cross-linguality, domain diversity, and varying label distributions which undermine generalisation.
We propose SPONGE, an efficient prototypical architecture that leverages competing sparse language representations.
These encompass distributed knowledge and create the necessary level of redundancy for effective transfer learning across multiple datasets.
We identify that prototypical classifiers are critically sensitive to label-recency bias which we mitigate with a novel strategy at inference time. SPONGE in combination with this strategy significantly boosts generalisation performance to unseen data.
With the help of medical professionals, we show that the explainability of our models is clinically relevant.
We make all source code available.

URL: https://openreview.net/forum?id=OevFdPgk3h

---

Title: Hard Work Does Not Always Pay Off: On the Robustness of NAS to Data Poisoning

Authors: Zachary Coalson, Huazheng Wang, Qingyun Wu, Sanghyun Hong

Abstract: We study the robustness of data-centric methods to find neural network architectures, known as neural architecture search (NAS), against data poisoning. To audit this robustness, we design a poisoning framework that enables the systematic evaluation of the ability of NAS to produce architectures under data corruption. Our framework examines four off-the-shelf NAS algorithms, representing different approaches to architecture discovery, against four data poisoning attacks, including one we tailor specifically for NAS. In our evaluation with the CIFAR-10 and CIFAR-100 benchmarks, we show that NAS is seemingly robust to data poisoning, showing marginal accuracy drops even under large poisoning budgets. However, we demonstrate that when considering NAS algorithms designed to achieve a few percentage points of accuracy gain, this expected improvement can be substantially diminished under data poisoning. We also show that the reduction varies across NAS algorithms and analyze the factors contributing to their robustness. Our findings are: (1) Training-based NAS algorithms are the least robust due to their reliance on data. (2) Training-free NAS approaches are the most robust but produce architectures that perform similarly to random selections from the search space. (3) NAS algorithms can produce architectures with improved accuracy, even when using out-of-distribution data like MNIST. We lastly discuss potential countermeasures.

URL: https://openreview.net/forum?id=Uhayg3Ia9W

---

New submissions
===============

Title: A Lower Bound for the Number of Linear Regions of Ternary ReLU Regression Neural Networks

Abstract: With the advancement of deep learning, reducing computational complexity and memory consumption has become a critical challenge, and ternary neural networks (NNs) that restrict parameters to $\{-1, 0, +1\}$ have attracted attention as a promising approach. While ternary NNs demonstrate excellent performance in practical applications such as image recognition and natural language processing, their theoretical understanding remains insufficient. In this paper, we theoretically analyze the expressivity of ternary NNs from the perspective of the number of linear regions. Specifically, we evaluate the number of linear regions of ternary regression NNs with Rectified Linear Unit (ReLU) for activation functions and prove that the number of linear regions increases polynomially with respect to network width and exponentially with respect to depth, similar to standard NNs. Moreover, we show that it suffices to first double the width, then either square the width or double the depth of ternary NNs to achieve a lower bound on the maximum number of linear regions comparable to that of general ReLU regression NNs. This provides a theoretical explanation, in some sense, for the practical success of ternary NNs.

URL: https://openreview.net/forum?id=Yg7tt1hWiF

---

Title: CMOOD: Concept-based Multi-label OOD Detection

Abstract: How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large language models have revolutionized zero-shot OOD detection, they primarily focus on single-label scenarios, leaving a critical gap in handling real-world tasks where samples can be associated with multiple interdependent labels. To address these challenges, we introduce COOD, a novel zero-shot multi-label OOD detection framework. COOD leverages pre-trained vision-language models, enhancing them with a concept-based label expansion strategy and a new scoring function. By enriching the semantic space with both positive and negative concepts for each label, our approach models complex label dependencies, precisely differentiating OOD samples without the need for additional training. Extensive experiments demonstrate that our method significantly outperforms existing approaches, achieving approximately 95% average AUROC on both VOC and COCO datasets, while maintaining robust performance across varying numbers of labels and different types of OOD samples.

URL: https://openreview.net/forum?id=EmoFJ8tcko

---

Title: Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation

Abstract: Concept-based approaches, which aim to identify human-understandable concepts within a model's internal representations, are a promising method for interpreting embeddings from deep neural network models, such as CLIP. While these approaches help explain model behavior, current methods lack statistical rigor, making it challenging to validate identified concepts and compare different techniques. To address this challenge, we introduce a hypothesis testing framework that quantifies rotation-sensitive structures within the CLIP embedding space.
Once such structures are identified, we propose a post-hoc concept decomposition method. Unlike existing approaches, it offers theoretical guarantees that discovered concepts represent robust, reproducible patterns (rather than method-specific artifacts) and outperforms other techniques in terms of reconstruction error. Empirically, we demonstrate that our concept-based decomposition algorithm effectively balances reconstruction accuracy with concept interpretability and helps mitigate spurious cues in data. Applied to a popular spurious correlation dataset, our method yields a 22.6% increase in worst-group accuracy after removing spurious background concepts.

URL: https://openreview.net/forum?id=D6K0Wi3kRY

---

Title: Post-Training Adaptive Conformal Prediction for Incomplete Time Series

Abstract: Conformal Prediction (CP) is widely used for uncertainty quantification but faces significant challenges with time series due to non-exchangeability. The issue is exacerbated by missing data, where the exponential growth of missing patterns makes existing approaches computationally expensive and unable to adequately represent each missing pattern. To address this, we propose a novel approach that uses a post-training Neural Network (NN) to handle temporal dependencies and structured missingness in time series data. With a novel non-conformity score function, our method improves conditional coverage for different missing patterns, ensuring prediction intervals are both reliable and informative. We introduce features that capture different missingness mechanisms, enabling the model to adapt to various patterns. Theoretically, we establish asymptotic validity for conditional coverage with adaptive adjustments. Experiments on semi-synthetic benchmarks demonstrate the method's efficiency in producing tight prediction intervals while maintaining group conditional coverage.

URL: https://openreview.net/forum?id=KMBU4wx79B

---

Title: The Pitfalls of Model Collapse when Aligning LLMs through Model Merge

Abstract: Model merge offers a cost-efficient method for integrating multiple specialized large language models (LLMs) into one comprehensive model. While it shows promise for encoder-decoder models in standard Natural Language Processing (NLP) tasks, \textbf{we find that merging decoder-based LLMs may exacerbate alignment tax and lead to model collapse, even when overall performance appears to improve.} We specifically assess the applications of model merge in steering LLMs to align better with diverse human preferences through interpolation and extrapolation merge. Our extensive experiments, covering model sizes ranging from $\mathtt{7b}$ to $\mathtt{70b}$ parameters, and including sixteen models with varying post-training, employ three popular merging methods: $\mathtt{Task~Arithmetic}$, $\mathtt{TIES}$-$\mathtt{Merging}$, and $\mathtt{Dare}$-$\mathtt{TIES}$. Our results uncover inherent limitations in current model merge applications for alignment, which can lead to text degeneration. We hope our findings will offer valuable insights for employing model merging in alignment scenarios and can help practitioners avoid potential pitfalls.

URL: https://openreview.net/forum?id=zJAy9Tt9DX

---

Title: Securing Representations via Latent Disruption and Private Decoding

Abstract: Pre-trained encoders facilitate efficient data sharing through semantically rich latent embeddings, which, however, pose privacy risks under malicious inference or exploitation. We propose SEAL, an attack-agnostic framework that secures latent spaces by disrupting semantic dependencies based on information-theoretic principles. It prevents potential misuse while enabling selective reconstruction for trusted users. SEAL learns to encode controlled perturbations by minimizing the Matrix Norm-based Quadratic Mutual Information (MQMI) functional between original and secured embeddings within a hyperspherical latent space. Meanwhile, a private decoder, jointly trained with the SEAL encoder, ensures accurate reconstruction that is accessible only to authorized users. Extensive experiments on vision and text datasets demonstrate that SEAL effectively mitigates latent leakage, defends against inference attacks, and preserves reconstruction utility.

URL: https://openreview.net/forum?id=nZWBrxJyrS

---

Title: Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

Abstract: Convolutional Neural Networks (CNNs) have shown remarkable performance in image classification. However, interpreting their predictions is challenging due to the size and complexity of these models. State-of-the-art saliency methods generate local explanations highlighting the area in the input image where a class is identified but cannot explain how a concept of interest contributes to the prediction. On the other hand, concept-based methods, such as TCAV, provide insights into how sensitive the network is to a human-defined concept but cannot compute its attribution in a specific prediction nor show its location within the input image. We introduce Visual-TCAV, a novel explainability framework aiming to bridge the gap between these methods by providing both local and global explanations. Visual-TCAV uses Concept Activation Vectors (CAVs) to generate class-agnostic saliency maps that show where the network recognizes a certain concept. Moreover, it can estimate the attribution of these concepts to the output of any class using a generalization of Integrated Gradients. We evaluate the method's faithfulness via a controlled experiment where the ground truth for explanations is known, showing better ground truth alignment than TCAV. Our code is available at (see supplementary material .zip file).

URL: https://openreview.net/forum?id=SLh00W5rhu

---

Title: ReactEmbed: A Plug-and-Play Module for Unifying Protein-Molecule Representations Guided by Biochemical Reaction Networks

Abstract: The computational representation of proteins and molecules is a cornerstone of modern biology.
However, state-of-the-art models represent these entities in separate and incompatible embedding manifolds, limiting our ability to model the systemic biological processes that depend on their interaction.
We introduce ReactEmbed, a lightweight, plug-and-play enhancement module that bridges this gap.
Our key invention is a new paradigm that leverages biochemical reaction networks as a definitive source of functional semantics, as co-participation in reactions explicitly defines a functional role.
ReactEmbed takes existing, frozen embeddings from state-of-the-art models and aligns them in a unified space through a novel relational learning framework.
This framework interprets a weighted reaction graph using a specialized sampling strategy to distill functional relationships.
This process yields a cascade of benefits: (1) It enriches the unimodal embeddings, improving their performance on domain-specific tasks. (2) It achieves strong results on a diverse range of cross-domain benchmarks.
ReactEmbed provides a practical and powerful method to enhance and unify biological representations, effectively turning disconnected models into a more cohesive, functionally-aware system.
The code and database are available for open use.

URL: https://openreview.net/forum?id=3tTCzmsStR

---

Reply all

Reply to author

Forward

0 new messages