Daily TMLR digest for Jan 07, 2026

4 views

Skip to first unread message

TMLR

unread,

Jan 7, 2026, 12:30:21 AMJan 7

to tmlr-anno...@googlegroups.com

New certifications
==================

Survey Certification: Fast weight programming and linear transformers: from machine learning to neurobiology

Kazuki Irie, Samuel J. Gershman

https://openreview.net/forum?id=TDG8EkNmQR

---

Accepted papers
===============

Title: Leveraging the True Depth of LLMs

Authors: Ramón Calvo González, Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret

Abstract: The remarkable capabilities of Large Language Models (LLMs) are overshadowed by their immense
computational cost. While recent work has shown that many LLM layers can be reordered or even
removed with minimal impact on accuracy, these insights have not been translated into significant
inference speedups. To bridge this gap, we introduce a novel method that restructures the
computational graph by grouping and evaluating consecutive layer pairs in parallel. This approach,
requiring no retraining, yields a 1.19x throughput gain on Llama 2 7B while reducing the average
benchmark accuracy by only 1.5%. We demonstrate the practical value of this method for large-scale
LLM deployment and show that some of the lost accuracy can be recovered with lightweight
fine-tuning of the parallelized layers.

URL: https://openreview.net/forum?id=JccJ6YfWd4

---

Title: An analysis of distributional reinforcement learning with Gaussian mixtures

Authors: Mathis Antonetti, Henrique Donancio, Florence Forbes

Abstract: Distributional Reinforcement Learning (DRL) aims at optimizing a risk measure of the return by representing its distribution. However, finding a representation of this distribution is challenging as it requires a tractable estimation of the risk measure, a tractable loss, and a representation with enough approximation power. Although Gaussian mixtures (GM) are powerful statistical models to solve these challenges, only very few papers have investigated this approach and most use the L$_2$ space norm as a tractable metric between GM. In this paper, we provide new theoretical results on previously unstudied metrics. We show that the L$_2$ metric is not suitable and propose alternative metrics, a mixture-specific optimal transport (MW) distance and a maximum mean discrepancy distance. Focusing on temporal difference (TD) learning, we prove a convergence result for a related dynamic programming algorithm for the MW metric. Leveraging natural multivariate GM representations, we also highlight the potential of MW in multi-objective RL. Our approach is illustrated on some environments of the Atari Learning Environment benchmark and shows promising empirical results.

URL: https://openreview.net/forum?id=b4VgI1RTv8

---

Title: CADmium: Fine-Tuning Code Language Models for Text- Driven Sequential CAD Design

Authors: Prashant Govindarajan, Davide Baldelli, Jay Pathak, Quentin Fournier, Sarath Chandar

Abstract: Computer-aided design (CAD) is the digital construction of 2D and 3D objects, and is central to a wide range of engineering and manufacturing applications like automobile and aviation. Despite its importance, CAD modeling remains largely a time-intensive, manual task. Recent works have attempted to automate this process with small transformer-based models and handcrafted CAD sequence representations. However, there has been little effort to leverage the potential of large language models (LLMs) for sequential CAD design. In this work, we introduce a new large-scale dataset of more than 170k CAD models annotated with high-quality, human-like descriptions generated with our pipeline based on GPT-4.1. Using this dataset, we fine-tune powerful code-LLMs to generate CAD sequences represented in a JSON-based format from natural language descriptions, demonstrating the viability and effectiveness of this approach for text-conditioned CAD generation. Because simple metrics often fail to reflect the quality of generated objects, we introduce geometric and topological metrics based on sphericity, mean curvature, and Euler characteristic to provide richer structural insights. Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design, drastically speeding up the design of new objects. The dataset, code, and fine-tuned models are available online.

URL: https://openreview.net/forum?id=lExqWvQht8

---

Title: Beyond Expectations: Learning with Stochastic Dominance Made Practical

Authors: Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

Abstract: Stochastic dominance serves as a general framework for modeling a broad spectrum of decision preferences under uncertainty, with risk aversion as one notable example, as it naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: i), the original concept of stochastic dominance only provides a partial order, therefore, is not amenable to serve as a general optimality criterion; and ii), an efficient computational recipe remains lacking due to the continuum nature of evaluating stochastic dominance.

In this work, we make the first attempt towards establishing a general framework of learning with stochastic dominance. We first generalize the stochastic dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We next develop a simple and computationally efficient approach for finding the optimal solution in terms of stochastic dominance, which can be seamlessly plugged into many learning tasks. Numerical experiments demonstrate that the proposed method achieves comparable performance as standard risk-neutral strategies and obtains better trade-offs against risk across a variety of applications including supervised learning, reinforcement learning, and portfolio optimization.

URL: https://openreview.net/forum?id=ebyPKXsweD

---

Title: Fast weight programming and linear transformers: from machine learning to neurobiology

Authors: Kazuki Irie, Samuel J. Gershman

Abstract: Recent advances in artificial neural networks for machine learning, and language modeling in particular, have established a family of recurrent neural network (RNN) architectures that, unlike conventional RNNs with vector-form hidden states, use two-dimensional (2D) matrix-form hidden states. Such 2D-state RNNs, known as Fast Weight Programmers (FWPs), can be interpreted as a neural network whose synaptic weights (called fast weights) dynamically change over time as a function of input observations, and serve as short-term memory storage; corresponding synaptic weight modifications are controlled or programmed by another network (the programmer) whose parameters are trained (e.g., by gradient descent). In this Primer, we review the technical foundations of FWPs, their computational characteristics, and their connections to transformers and state space models. We also discuss connections between FWPs and models of synaptic plasticity in the brain, suggesting a convergence of natural and artificial intelligence.

URL: https://openreview.net/forum?id=TDG8EkNmQR

---

Title: TabRep: Training Tabular Diffusion Models with a Simple and Effective Continuous Representation

Authors: Jacob Si, Zijing Ou, Mike Qu, Zhengrui Xiang, Yingzhen Li

Abstract: Diffusion models have been the predominant generative model for tabular data generation. However, they face the conundrum of modeling under a separate versus a unified data representation. The former encounters the challenge of jointly modeling all multi-modal distributions of tabular data in one model. While the latter alleviates this by learning a single representation for all features, it currently leverages sparse suboptimal encoding heuristics and necessitates additional computation costs. In this work, we address the latter by presenting TabRep, a tabular diffusion architecture trained with a unified continuous representation. To motivate the design of our representation, we provide geometric insights into how the data manifold affects diffusion models. The key attributes of our representation are composed of its density, flexibility to provide ample separability for nominal features, and ability to preserve intrinsic relationships. Ultimately, TabRep provides a simple yet effective approach for training tabular diffusion models under a continuous data manifold. Our results showcase that TabRep achieves superior performance across a broad suite of evaluations. It is the first to synthesize tabular data that exceeds the downstream quality of the original datasets while preserving privacy and remaining computationally efficient.

URL: https://openreview.net/forum?id=yRbtFEh2OP

---

Title: The Geometry of Algorithmic Stability: A Hodge Theoretic View on Structural vs. Statistical Instability

Authors: Karen Sargsyan

Abstract: Algorithmic stability—the robustness of predictions to training data perturbations—is fundamental to reliable machine learning. We propose a unified mathematical framework that rigorously distinguishes between two sources of instability: structural inconsistency and statistical variance. We formalize structural inconsistency using Combinatorial Hodge Theory, characterizing it as cyclical flows (Condorcet cycles) on a graph of hypotheses. This framework reveals that methods like inflated operators and regularization specifically target these structural obstructions, while methods like bagging primarily address statistical variance. We provide direct empirical validation through three key experiments. First, in a controlled setting with engineered Condorcet cycles (pure structural instability), inflated operators achieve perfect stability while bagging fails, confirming the core distinction. Second, we validate on a standard digit classification task that structural obstructions are negligible ($||C_{cycle}|| \approx 2.3 \times 10^{-16}$, machine precision), explaining the empirical dominance of variance-reduction methods. Third, we demonstrate that significant structural obstructions naturally emerge in fairness-constrained model selection on real-world data ($||C_{cycle}|| = 0.857$, approximately $10^{15}$ times larger), providing a topological characterization of the instability arising from incompatible objectives.

URL: https://openreview.net/forum?id=rFqsgVXZYO

---

Title: Vejde: A Framework for Inductive Deep Reinforcement Learning Based on Factor Graph Color Refinement

Authors: Jakob Nyberg, Pontus Johnson

Abstract: We present and evaluate Vejde; a framework which combines data abstraction, graph learning and reinforcement learning
to produce inductive policy functions for decision problems with richly structured states, such as object classes and relations.
Markov decision process states are represented as data bases of facts about entities, and Vejde converts each state to a bipartite graph, which is mapped to latent states through neural message passing. The factored representation of both states and actions allows Vejde agents to handle problems of varying size and structure. We tested Vejde agents on eight problem domains defined in RDDL, with ten problem instances each, where policies were trained using both supervised and reinforcement learning. To test policy generalization, we separate problem instances in two sets, one for training and the other solely for testing. Test results on unseen instances for the Vejde agents were compared to MLP agents trained on each problem instance, as well as the online planning algorithm Prost.
Our results show that Vejde policies in average generalize to the test instances without a significant loss in score. Additionally, the inductive agents received scores on unseen test instances that on average were close to the instance-specific MLP agents.

URL: https://openreview.net/forum?id=EFSZmL1W1Z

---

Title: Adversarial Vulnerability from On-Manifold Inseparability and Poor Off-Manifold Convergence

Authors: Rajdeep Haldar, Yue Xing, Qifan Song, Guang Lin

Abstract: We introduce a new perspective on adversarial vulnerability in image classification: fragility can arise from poor convergence in off-manifold directions. We model data as lying on low-dimensional manifolds, where on-manifold directions correspond to high-variance, data-aligned features and off-manifold directions capture low-variance, nuanced features. Standard first-order optimizers, such as gradient descent, are inherently ill-conditioned, leading to slow or incomplete convergence in off-manifold directions. When data is inseparable along the on-manifold direction, robustness depends on learning these subtle off-manifold features, and failure to converge leaves models exposed to adversarial perturbations.

On the theoretical side, we formalize this mechanism through convergence analyses of logistic regression and two-layer linear networks under first-order methods. These results highlight how ill-conditioning slows or prevents convergence in off-manifold directions, thereby motivating the use of second-order methods which mitigate ill-conditioning and achieve convergence across all directions. Empirically, we demonstrate that even without adversarial training, robustness improves significantly with extended training or second-order optimization, underscoring convergence as a central factor.

As an auxiliary empirical finding, we observe that batch normalization suppresses these robustness gains, consistent with its implicit bias toward uniform-margin rather than max-margin solutions.

By introducing the notions of on- and off-manifold convergence, this work provides a novel theoretical explanation for adversarial vulnerability.

URL: https://openreview.net/forum?id=pa90uRZATF

---

New submissions
===============

Title: Understanding the Resource Cost of Fully Homomorphic Encryption in Quantum Federated Learning

Abstract: Quantum Federated Learning (QFL) enables distributed training of Quantum Machine Learning (QML) models by sharing model gradients instead of raw data. However, these gradients can still expose sensitive user information. To enhance privacy, homomorphic encryption of parameters has been proposed as a solution in QFL and related frameworks. In this work, we evaluate the overhead introduced by Fully Homomorphic Encryption (FHE) in QFL setups and assess its feasibility for real-world applications. We implemented various QML models including a Quantum Convolutional Neural Network (QCNN) trained in a federated environment with parameters encrypted using the CKKS scheme. This work marks the first QCNN trained in a federated setting with CKKS-encrypted parameters. Models of varying architectures were trained to predict brain tumors from MRI scans. The experiments reveal that memory and communication overhead remain substantial, making FHE challenging to deploy. Minimizing overhead requires reducing the number of model parameters, which, however, leads to a decline in classification performance, introducing a trade-off between privacy and model complexity.

URL: https://openreview.net/forum?id=guZEpFIKcN

---

Reply all

Reply to author

Forward

0 new messages