Daily TMLR digest for Jan 29, 2026

1 view

Skip to first unread message

TMLR

unread,

Jan 29, 2026, 12:30:10 AMJan 29

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: \textsc{PGO-BEN}: Proxy-Guided Orthogonalization and Beta Ensembling for Few-Shot Domain-Incremental Learning

Authors: Samrat Mukherjee, Thivyanth Venkateswaran, Eric Nuertey Coleman, Luigi Quarantiello, Julio Hurtado, Vincenzo Lomonaco, Gemma Roig, Subhasis Chaudhuri, Biplab Banerjee

Abstract: Continual adaptation to evolving domains with minimal supervision is essential for real-world deployment of machine learning systems. We formalize this objective as \textbf{Few-Shot Domain-Incremental Learning (FSDIL)}, where a model must adapt to each new domain using only a few labeled samples while retaining prior knowledge without access to previous data. This setting mirrors practical constraints in domains such as autonomous driving and medical imaging, where annotations are expensive and data retention is restricted by privacy regulations.
Pre-trained vision–language models such as CLIP provide a strong initialization for FSDIL due to their transferable multi-modal representations. However, adapting CLIP incrementally under domain shifts remains challenging: few-shot updates often trigger \emph{catastrophic forgetting} and insufficient \emph{plasticity} across evolving distributions.
To address these challenges, we introduce \textbf{\textsc{PGO-BEn}} (\textit{Proxy-Guided Orthogonalization and Beta Ensembling})—a rehearsal-free framework that leverages CLIP’s semantic priors via prompt learning while preserving prior domain knowledge through two key mechanisms.
(1) \textbf{Proxy-Guided Orthogonalization (PGO):} identifies conflicts between current gradients and proxy representations of past knowledge, inferred from current samples, and projects conflicting updates into an orthogonal subspace to prevent knowledge degradation.
(2) \textbf{Beta Ensembling (BEn):} introduces a Beta-function-based temporal ensembling strategy that adaptively balances stability and plasticity, outperforming conventional exponential moving average (EMA) approaches in retaining early-domain knowledge.
We extensively evaluate \textsc{PGO-BEn} on three diverse benchmarks—\textbf{DomainNet}, \textbf{CoRE50}, and \textbf{CDDB-Hard}—and demonstrate consistent improvements over state-of-the-art domain-incremental and few-shot learning methods across all supervision levels in this challenging setting.

URL: https://openreview.net/forum?id=jlb27FbHLv

---

Title: Differentially Private Conformal Prediction via Quantile Binary Search

Authors: Ogonnaya Michael Romanus, Roberto Molinari

Abstract: Differentially Private (DP) approaches have been widely explored and implemented for a broad variety of tasks delivering corresponding privacy guarantees in these settings. While most of these DP approaches focus on limiting privacy leakage from training data, there are fewer approaches that consider leakage when procedures involve \textit{calibration data} which is common in uncertainty quantification through Conformal Prediction (CP). Since there is a limited amount of approaches in this direction, in this work we deliver a general DP approach for CP that we call Private Conformity via Quantile Search (P-COQS). The proposed approach adapts an existing randomized binary search algorithm for computing DP quantiles in the calibration phase of CP thereby guaranteeing privacy of the consequent prediction sets. This however comes at a price of marginally under-covering with respect to the desired $(1 - \alpha)$-level when using finite-sample calibration sets (although broad empirical results show that the P-COQS generally targets the required level in the considered cases). Confirming properties of the adapted algorithm and quantifying the approximate coverage guarantees of the consequent CP, we conduct extensive experiments to examine the effects of privacy noise, sample size and significance level on the performance of P-COQS compared to existing alternatives. In addition, we empirically evaluate our approach on several benchmark datasets, including CIFAR-10, ImageNet and CoronaHack. Our results suggest that the proposed method is robust to privacy noise and performs favorably with respect to the current DP alternative in terms of \textit{empirical coverage}, \textit{efficiency}, and \textit{informativeness}. Specifically, the results indicate that P-COQS produces smaller conformal prediction sets while simultaneously targeting the desired coverage and privacy guarantees in all these experimental settings.

URL: https://openreview.net/forum?id=IK7tNOucJ3

---

Title: Dealing with Uncertainty in Contextual Anomaly Detection

Authors: Luca Bindini, Lorenzo Perini, Stefano Nistri, Jesse Davis, Paolo Frasconi

Abstract: Contextual anomaly detection (CAD) aims to identify anomalies in a target (behavioral) variable conditioned on a set of contextual variables that influence the normalcy of the target variable but are not themselves indicators of anomaly. In many anomaly detection tasks, there exist contextual variables that influence the normalcy of the target variable but are not themselves indicators of anomaly. In this work, we propose a novel framework for CAD, normalcy score (NS), that explicitly models both the aleatoric and epistemic uncertainties. Built on heteroscedastic Gaussian process regression, our method regards the Z-score as a random variable, providing confidence intervals that reflect the reliability of the anomaly assessment. Through experiments on benchmark datasets and a real-world application in cardiology, we demonstrate that NS outperforms state-of-the-art CAD methods in both detection accuracy and interpretability. Moreover, confidence intervals enable an adaptive, uncertainty-driven decision-making process, which may be very important in domains such as healthcare.

URL: https://openreview.net/forum?id=yLoXQDNwwa

---

Title: On Uncertainty Calibration for Equivariant Functions

Authors: Edward Berman, Jacob Ginesin, Marco Pacini, Robin Walters

Abstract: Data-sparse settings such as robotic manipulation, molecular physics, and galaxy morphology classification are some of the hardest domains for deep learning. For these problems, equivariant networks can help improve modeling across undersampled parts of the input space, and uncertainty estimation can guard against overconfidence. However, until now, the relationships between equivariance and model confidence, and more generally equivariance and model calibration, has yet to be studied. Since traditional classification and regression error terms show up in the definitions of calibration error, it is natural to suspect that previous work can be used to help understand the relationship between equivariance and calibration error. In this work, we present a theory relating equivariance to uncertainty estimation. By proving lower and upper bounds on uncertainty calibration errors (ECE and ENCE) under various equivariance conditions, we elucidate the generalization limits of equivariant models and illustrate how symmetry mismatch can result in miscalibration in both classification and regression. We complement our theoretical framework with numerical experiments that clarify the relationship between equivariance and uncertainty using a variety of real and simulated datasets, and we comment on trends with symmetry mismatch, group size, and aleatoric and epistemic uncertainties.

URL: https://openreview.net/forum?id=rxLUTPLBT3

---

Title: Disentangled Concept-Residual Models: Bridging the Interpretability–Performance Gap for Incomplete Concept Sets

Authors: Renos Zabounidis, Ini Oguntola, Konghao Zhao, Joseph Campbell, Woojun Kim, Simon Stepputtis, Katia P. Sycara

Abstract: Deploying AI in high-stakes settings requires models that are not only accurate but also interpretable and amenable to human oversight. Concept Bottleneck Models (CBMs) support these goals by structuring predictions around human-understandable concepts, enabling interpretability and post-hoc human intervenability. However, CBMs rely on a ‘complete’ concept set, requiring practitioners to define and label enough concepts to match the predictive power of black-box models. To relax this requirement, prior work introduced residual connections that bypass the concept layer and recover information missing from an incomplete concept set. While effective in bridging the performance gap, these residuals can redundantly encode concept information, a phenomenon we term \textbf{concept-residual overlap}. In this work, we investigate the effects of concept-residual overlap and evaluate strategies to mitigate it. We (1) define metrics to quantify the extent of concept-residual overlap in CRMs; (2) introduce complementary metrics to evaluate how this overlap impacts interpretability, concept importance, and the effectiveness of concept-based interventions; and (3) present \textbf{Disentangled Concept-Residual Models (D-CRMs)}, a general class of CRMs designed to mitigate this issue. Within this class, we propose a novel disentanglement approach based on minimizing mutual information (MI). Using CelebA, CIFAR100, AA2, CUB, and OAI, we show that standard CRMs exhibit significant concept-residual overlap, and that reducing this overlap with MI-based D-CRMs restores key properties of CBMs, including interpretability, functional reliance on concepts, and intervention robustness, without sacrificing predictive performance.

URL: https://openreview.net/forum?id=NKgNizwDa6

---

New submissions
===============

Title: Activation Functions and Normalization in Deep Continual Learning

Abstract: Deep learning models often struggle to remain adaptable in continual learning scenarios, where the data distribution changes over time. Beyond the well-known challenge of catastrophic forgetting, these models also face plasticity loss that is characterized as the gradual decline in their ability to learn from future data. We study plasticity loss through the lens of activation and normalization interactions.
Through a large-scale empirical study, we evaluate 26 activation functions across three normalization strategies using ResNet-18 on the class-incremental CIFAR-100 benchmark. Our findings reveal that plasticity is not determined by any single design choice, but rather is influenced by the complex interaction between activation functions and normalization layers. We uncover a link between overfitting and plasticity loss, and show that simple yet effective training strategies, such as applying soft labels, learning rate warm-up and excluding affine normalization parameters from L2 regularization can significantly slow down the emergence of plasticity loss. Based on these findings, we offer additional recommendations for model design and training, we keep the networks inherently more performant and adaptable over a long time without any active component.

URL: https://openreview.net/forum?id=BtymUQyqtQ

---

Title: PLGC: Pseudo-Labeled Graph Condensation

Abstract: Large graph datasets make training graph neural networks (GNNs) computationally costly. Graph condensation methods address this by generating small synthetic graphs that approximate the original data. However, existing approaches rely on clean, supervised labels, which limits their reliability when labels are scarce, noisy, or inconsistent.
We propose Pseudo-Labeled Graph Condensation (PLGC), a self-supervised framework that constructs latent pseudo-labels from node embeddings and optimizes condensed graphs to match the original graph’s structural and feature statistics—without requiring ground-truth labels.
PLGC offers three key contributions: (1) A diagnosis of why supervised condensation fails under label noise and distribution shift. (2) A label-free condensation method that jointly learns latent prototypes and node assignments. (3) Theoretical guarantees showing that pseudo-labels preserve latent structural statistics of the original graph and ensure accurate embedding alignment.
Empirically, across node classification and link prediction tasks, PLGC achieves competitive performance with state-of-the-art supervised condensation methods on clean datasets and exhibits substantial robustness under label noise, often outperforming all baselines by a significant margin.
Our findings highlight the practical and theoretical advantages of self-supervised graph condensation in noisy or weakly-labeled environments\footnote{Code Link: \href{https://anonymous.4open.science/r/PLGC-0B26/

URL: https://openreview.net/forum?id=TkpewrzsnJ

---

Title: Local MDI+: Local Feature Importances for Tree-Based Models

Abstract: Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific) feature importance (LFI) methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model’s internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a global feature importance method which combines tree-based and linear feature importances by exploiting an equivalence between decision trees and least squares on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework that quantifies feature importances for each particular sample. Across twelve real-world benchmark datasets, LMDI+ outperforms existing baselines at identifying instance-specific predictive features, yielding an average 10% improvement in predictive performance when using only the selected features. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across repeated model fits with different random seeds. Ablation experiments show that each component of LMDI+ contributes to these gains, and that the improvements extend beyond random forests to gradient boosting models. Finally, we show that LMDI+ enables local interpretability use cases by identifying closely matched counterfactuals for each classification benchmark and discovering homogeneous subgroups in a commonly-used housing dataset.

URL: https://openreview.net/forum?id=TcXidnGHpA

---

Title: Mitigating Social Desirability Bias in Random Silicon Sampling

Abstract: Large Language Models (LLMs) are increasingly used to simulate population responses, a method known as ``Silicon Sampling''.
However, responses to socially sensitive questions frequently exhibit Social Desirability Bias (SDB), diverging from real human data toward socially acceptable answers. Existing studies on social desirability bias in LLM-based sampling remain limited. In this work, we investigate whether minimal, psychologically grounded prompt wording can mitigate this bias and improve alignment between silicon and human samples.
We conducted a study using data from the American National Election Study (ANES) on three LLMs from two model families: the open-source Llama-3.1 series and GPT-4.1-mini. We first replicate a baseline silicon sampling study, confirming the persistent Social Desirability Bias. We then test four prompt-based mitigation methods: reformulated (neutral, third-person phrasing), reverse-coded (semantic inversion), and two meta-instructions, priming and preamble, respectively encouraging analytics and sincerity. Alignment with ANES is evaluated using Jensen-Shannon Divergence with bootstrap confidence intervals. Our results demonstrate that reformulated prompts most effectively improve alignment by reducing distribution concentration on socially acceptable answers and achieving distributions closer to ANES. Reverse-coding produced mixed results across eligible items, while the Priming and Preamble encouraged response uniformity and showed no systematic benefit for bias mitigation. Our findings validate the efficacy of prompt-based framing controls in mitigating inherent Social Desirability Bias in LLMs, providing a practical path toward more representative silicon samples.

URL: https://openreview.net/forum?id=DmwJzi99Bi

---

Title: ProPINN: Demystifying Propagation Failures in Physics Informed Neural Networks

Abstract: Physics-informed neural networks (PINNs) have earned high expectations in solving partial differential equations (PDEs), but their optimization usually faces thorny challenges due to the unique derivative-dependent loss function. By analyzing the loss distribution, previous research observed the propagation failure phenomenon of PINNs, intuitively described as the correct supervision for model outputs cannot ``propagate'' from initial states or boundaries to the interior domain. Going beyond intuitive understanding, this paper provides a formal and in-depth study of propagation failure and its root cause. Based on a detailed comparison with classical finite element methods, we ascribe the failure to the conventional single-point-processing architecture of PINNs and further prove that propagation failure is essentially caused by the lower gradient correlation of PINN models on nearby collocation points. Compared to superficial loss maps, this new perspective provides a more precise quantitative criterion to identify where and why PINN fails. The theoretical finding also inspires us to present a new PINN architecture, named ProPINN, which can effectively unite the gradients of region points for better propagation. ProPINN can reliably resolve PINN failure modes and significantly surpass advanced Transformer-based models with 46% relative promotion.

URL: https://openreview.net/forum?id=Wy54lrFd46

---

Reply all

Reply to author

Forward

0 new messages