Daily TMLR digest for Jan 27, 2026

1 view

Skip to first unread message

TMLR

unread,

Jan 27, 2026, 12:30:09 AMJan 27

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Robust Conformal Prediction for Infrequent Classes

Authors: Jens-Michalis Papaioannou, Sebastian Jäger, Alexei Figueroa, David Stutz, Betty van Aken, Keno Bressem, Wolfgang Nejdl, Felix Gers, Alexander Löser, Felix Biessmann

Abstract: Many real-world classification tasks involve datasets with large and imbalanced label spaces, making class-specific uncertainty quantification particularly challenging. Conformal Prediction (CP) provides a model-agnostic framework, which formally guarantees coverage, meaning that its prediction sets contain the true label with a user-defined probability (confidence level). However, standard class-conditional methods often fail when data is scarce for some classes. We propose a method that uses domain knowledge or label hierarchies to dynamically group semantically related classes to meet the desired coverage for a given confidence threshold. Our method maintains class-conditioned calibration when possible and provides group-conditioned guarantees where necessary. We evaluate our method on outcome diagnoses prediction, an important clinical task that does not only benefit from robust uncertainty estimation, but also presents a very imbalanced label distribution. We conduct experiments using three clinical datasets employing two medical taxonomies (ICD-10 and CCSR) and label spaces of varying sizes with up to more than 1,000 classes. Our results show that the proposed approach is able to successfully exploit the label hierarchy and consistently improves class-conditional coverage for infrequent diagnoses. By improving coverage for underrepresented classes, our method enhances the reliability and trustworthiness of predictive models. This improvement is especially valuable in clinical applications, where failure to detect rare but serious conditions can lead to harmful consequences.

URL: https://openreview.net/forum?id=nJ4p8rh3Ig

---

Title: LZ Penalty: An information-theoretic repetition penalty for autoregressive language models.

Authors: Tony A Ginart, Naveen Kodali, Jason Lee, Caiming Xiong, Silvio Savarese, John Emmons

Abstract: We introduce the Lempel-Ziv (LZ) penalty, a penalty specialized for reducing degenerate repetitions in autoregressive language models without loss of capability. The penalty is based on the codelengths in the LZ77 universal lossless compression algorithm. Through the lens of the prediction-compression duality, decoding with the LZ penalty has the interpretation of sampling from the residual distribution after removing the information that is highly compressible. We demonstrate the LZ penalty enables open-source reasoning models to operate with greedy decoding without loss of capability and without instances of degenerate repetition. In contrast, both the industry-standard frequency penalty and repetition penalty are ineffective, incurring degenerate repetition rates of up to 4% or more.

URL: https://openreview.net/forum?id=vNzPB4YCHj

---

New submissions
===============

Title: Pruning Close to Home: Distance from Initialization impacts Lottery Tickets

Abstract: The Lottery Ticket Hypothesis (LTH) states that there exist sparse subnetworks (called 'winning' Lottery Tickets) within dense randomly initialized networks that, when trained under the same regime, achieve similar or better validation accuracy as the dense network. It has been shown that for larger networks and more complex datasets, these Lottery Tickets cannot be found in randomly initializations, but that they require lightly pretrained weights. More specifically, the pretrained weights need to be stable to SGD noise, but calculating this metric involves an expensive procedure. In this paper, we take a closer look at certain training hyperparameters that influence SGD noise throughout optimization. We show that by smart hyperparameter selection we can forego the pretraining step and still find winning tickets in various settings. We term these hyperparameters early-stable, as networks trained with those become stable to SGD noise early during training, and discover that the tickets they produce, exhibit remarkable generalization properties. Finally, we hypothesize that a larger Learning Distance negatively impacts generalization of the resulting sparse network when iterative pruning, and devise an experiment to show this.

URL: https://openreview.net/forum?id=caBhOKTD6Z

---

Title: CUDA: Capturing Uncertainty and Diversity in Preference Feedback Augmentation

Abstract: Preference-based Reinforcement Learning (PbRL) effectively addresses reward design challenges in Reinforcement Learning and facilitates human-AI alignment by enabling agents to learn human intentions. However, optimizing PbRL critically depends on abundant, diverse, and accurate human feedback, which is costly and time-consuming to acquire. Existing feedback augmentation methods aim to alleviate the scarcity of human preference feedback. However, they often neglect diversity, primarily generating feedback for high-confidence trajectory pairs with extreme differences. This approach leads to a biased augmented set that incompletely represents human preferences. To overcome this, we introduce Capturing Uncertainty and Diversity in preference feedback Augmentation (CUDA), a novel approach that comprehensively considers both uncertainty and diversity. CUDA enhances augmentation by employing ensemble-based uncertainty estimation for filtering and extracting feedback from diverse clusters via bucket-based categorization. These two mechanisms enable CUDA to obtain diverse and accurate augmented feedback. We evaluate CUDA on MetaWorld and DMControl offline datasets, demonstrating significant performance improvements over various offline PbRL algorithms and existing augmentation methods across diverse scenarios.

URL: https://openreview.net/forum?id=KWENSE1tC4

---

Title: Why Equivariant Networks Lose Information: Invariant Rings and the Role of Aggregation

Abstract: Equivariant neural networks exhibit fundamental expressivity limitations: rotation-equivariant networks collapse directional information to radial features, and matrix-equivariant networks show rank degeneracy. We explain these phenomena using classical invariant theory and prehomogeneous vector space (PVS) theory. For $\mathrm{SO}(3)$ on $\mathbb{R}^3$, the First Fundamental Theorem forces equivariant maps to be radial scalings; for $\mathrm{GL}(n) \times \mathrm{GL}(n)$ on matrices, PVS theory shows the invariant ring contains only constants. Our central finding is that aggregation, not depth, escapes these constraints: product representations $V^n$ have richer invariant rings with cross-invariants (e.g., dot products encoding angles) inaccessible to single-fiber processing. We connect this theory to modern architectures---SchNet, PaiNN, DimeNet, MACE---showing their body-order corresponds to which $V^n$ they access. Experiments confirm that $\mathrm{SO}(3)$- versus $\mathrm{O}(3)$-invariant networks exhibit categorically different expressivity on pseudoscalar targets ($R^2 = 1.00$ vs. $R^2 < 0$), and that cross-invariants enable learning angles while norm-only features cannot. These results provide design guidance: prioritize multi-body interactions over depth when expressivity is limited.

URL: https://openreview.net/forum?id=o1GqYCe2rS

---

Title: Not All CAMs Are Complete: Completeness as the Key to Faithfulness

Abstract: Although input-gradients techniques have evolved to mitigate and tackle the challenges associated with gradients, modern gradient-weighted CAM approaches still rely on vanilla gradients, which are inherently susceptible to the saturation phenomena. Despite recent enhancements have incorporated counterfactual gradient strategies as a mitigating measure, these local explanation techniques still exhibit a lack of sensitivity to their baseline parameter. Our work proposes a gradient-weighted CAM augmentation that tackles both the saturation and sensitivity problem by reshaping the gradient computation, incorporating two well-established and provably approaches: Expected Gradients and kernel smoothing. By revisiting the original formulation as the smoothed expectation of the perturbed integrated gradients, one can concurrently construct more faithful, localized and robust explanations which minimize infidelity. Through fine modulation of the perturbation distribution it is possible to regulate the complexity characteristic of the explanation, selectively discriminating stable features. Our technique, Expected Grad-CAM, differently from recent works, exclusively optimizes the gradient computation, purposefully designed as an enhanced substitute of the foundational Grad-CAM algorithm and any method built therefrom. Quantitative and qualitative evaluations have been conducted to assess the effectiveness of our method.

URL: https://openreview.net/forum?id=NeeGBwXNs5

---

Title: Computationally Sufficient Reductions for Joint Multiple Matrix Estimators with Sparsity and Fusion

Abstract: We study a broad class of methods for the joint estimation of multiple sparse
symmetric matrices that incorporates group and fusion penalties for borrowing
strength across related matrices. This class includes extensions of popular
methods for precision and covariance matrix estimation as well as PCA. We show
that these methods can be unified through the lens of computational
sufficiency, a recently proposed theory that can reveal hidden commonalities
between seemingly disparate methods yielding both theoretical insights into
the underlying optimization problems and practical advantages in terms of
computational efficiency. We derive a universal screening rule that applies
simultaneously to all methods in this class, allowing us to reduce the search
space to block diagonal matrices. This enables streamlined algorithms that
drastically reduce the runtime, making the methods far more scalable and
practical for high-dimensional data analysis.

URL: https://openreview.net/forum?id=KK9RHgSbdp

---

Title: Graph Unitary Message Passing

Abstract: Unitarity has emerged as a fundamental principle for efficient learning of deep neural networks, from parameter initialization to advanced optimizers, proven effective in various fields, including RNN, CNN, Transformer and Muon optimizer. However, imposing unitarity to the parameters is not enough to improve learning efficiency of graph neural networks (GNNs) due to the instability arising from the graph structure through the message passing mechanism. This data-dependent inefficiency, also known as oversquashing or oversmoothing problems, causes information from distant nodes to decay or node representation to become indistinguishable as the number of layers increases. Motivated by the success of unitarity in stabilizing neural network training, we propose a new graph-learning paradigm called Graph Unitary Message Passing (GUMP) to improve graph learning efficiency by applying unitary adjacency matrices for message passing. GUMP introduces a graph transformation algorithm that equips general graphs with unitary adjacency matrices while preserving original connectivity, and implements Newton-Schulz iteration for efficient unitary projection. Extensive experiments demonstrate that GUMP achieves significant performance improvements over vanilla message passing methods across various graph learning tasks.

URL: https://openreview.net/forum?id=dvNMDkSBIA

---

Title: Conflict-Averse IL-RL: Resolving Gradient Conflicts for Stable Imitation-to-Reinforcement Learning Transfer

Abstract: Reinforcement Learning (RL) and Imitation Learning (IL) offer complementary capabilities: RL can learn high-performing policies but is data-intensive, whereas IL enables rapid learning from demonstrations but is limited by the demonstrator's quality. Combining them offers the potential for improved sample efficiency in learning high-performing policies, yet naïve integrations often suffer from two fundamental issues: (1) negative transfer, where optimizing the IL loss hinders effective RL fine-tuning, and (2) gradient conflict, where differences in the scale or direction of IL and RL gradients lead to unstable updates.
We introduce Conflict-Averse IL--RL (CAIR), a general framework that addresses both challenges by combining two key components: (1) Loss Manipulation: an adaptive annealing mechanism utilizing a convex combination of IL and RL losses. This mechanism dynamically increases the weight of the RL loss when its gradient aligns with the IL gradient and decreases it otherwise, mitigating instabilities during the transition from IL to RL. (2) Gradient Manipulation: to further reduce conflict, we incorporate CAGrad to compute a joint gradient that balances IL and RL objectives while avoiding detrimental interference.
Under standard trust-region assumptions, CAIR guarantees monotonic improvement in the expected return when the loss weights are annealed monotonically. Our empirical study evaluates CAIR on four sparse-reward MuJoCo domains, where pure RL algorithms typically struggle. Compared against relevant hybrid RL baselines, CAIR improves sample efficiency in three out of four domains and asymptotic performance in two, while performing comparably on the remainder. These trends are consistent across multiple combinations of IL (BC, DAgger) and RL (DDPG, SAC, PPO) methods, demonstrating the robustness of the novel framework.

URL: https://openreview.net/forum?id=S98skK3FfD

---

Title: The Out-of-sample Extensions of t-SNE: From Gradient Descent to Fixed-point Iteration Algorithms

Abstract: This paper addresses the out-of-sample extension of the t-distributed stochastic neighbor embedding (t-SNE), namely extending the embedding to other data that were not considered in the training of the t-SNE. We demonstrate the ease of deriving the out-of-sample extension of t-SNE, thanks to the proper nature of t-SNE. Several resolution strategies are devised, from gradient descent to fixed-point iteration algorithms. Moreover, we establish several theoretical findings that allow to understand the underlying optimization mechanism of the fixed-point iteration, such as demonstrating that its repulsion-free variant corresponds to Newton's method, and providing several appealing properties, including connections with the mean shift algorithm and the resolution of the pre-image problem in Machine Learning. Experimental results on three well-known real data sets show the relevance and efficiency of the proposed out-of-sample methods, with the repulsion-free fixed-point iteration outperforming the other methods.

URL: https://openreview.net/forum?id=kYwq49F8Gt

---

Title: Boosting Text Encoder for Personalized Text-to-Image Generation

Abstract: In this paper, we introduce TextBoost, an efficient one-shot personalization approach for text-to-image diffusion models. Traditional personalization methods typically involve fine-tuning extensive portions of the model, leading to substantial storage requirements and slow convergence. In contrast, we propose selectively fine-tuning only the text encoder, significantly improving computational and storage efficiency. To preserve the original semantic integrity, we develop a novel causality-preserving adaptation mechanism. Additionally, lightweight adapters are employed to locally refine text embeddings immediately before their interaction with cross-attention layers, greatly enhancing the expressiveness of text embeddings with minimal computational overhead. Empirical evaluations across diverse concepts demonstrate that TextBoost achieves faster convergence and substantially reduces storage demands by minimizing the number of trainable parameters. Furthermore, TextBoost maintains comparable subject fidelity, superior text fidelity, and greater generation diversity compared to existing methods. We show that our proposed method offers an efficient, scalable, and practically applicable solution for high-quality text-to-image personalization, particularly beneficial in resource-constrained environments.

URL: https://openreview.net/forum?id=hiZzk1nHuV

---

Reply all

Reply to author

Forward

0 new messages