Daily TMLR digest for Aug 30, 2025

0 views

Skip to first unread message

TMLR

unread,

Aug 30, 2025, 12:06:06 AM (8 days ago) Aug 30

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: On the Low-Rank Parametrization of Reward Models for Controlled Language Generation

Authors: Sergey Troshin, Vlad Niculae, Antske Fokkens

Abstract: Language models trained on large amounts of data are known to produce inappropriate content in some cases and require careful tuning to be used in the real world. We revisit an effective and modular approach for controllability of the language models, when an external expert model guides the decoding. Particularly, we zoom in into the parametrization choice of an external expert, highlighting the difference between low-rank and higher-rank parametrizations. Higher-rank experts are designed to support high flexibility when representing the rewards, leading to higher computational costs during decoding. However, we demonstrate that they might not use their full flexibility. By analyzing the recently proposed reward-augmented decoding approach (RAD), which uses a higher-rank expert model, we introduce a simpler but more efficient low-rank parametrization of the expert model enabling fast and effective guided decoding. We empirically show that the low-rank RAD performs on par with the more flexible RAD on a detoxification and a sentiment control task, while requiring only a single reward model call per generated token.

URL: https://openreview.net/forum?id=cjRsEGLT8B

---

Title: Change Point Detection on A Separable Model for Dynamic Networks

Authors: Yik Lun Kei, Hangjian Li, Yanzhen Chen, OSCAR HERNAN MADRID PADILLA

Abstract: This paper studies the unsupervised change point detection problem in time series of networks using the Separable Temporal Exponential-family Random Graph Model (STERGM). Inherently, dynamic network patterns are complex due to dyadic and temporal dependence, and change points detection can identify the discrepancies in the underlying data generating processes to facilitate downstream analysis. In particular, the STERGM that utilizes network statistics and nodal attributes to represent the structural patterns is a flexible and parsimonious model to fit dynamic networks. We propose a new estimator derived from the Alternating Direction Method of Multipliers (ADMM) procedure and Group Fused Lasso (GFL) regularization to simultaneously detect multiple time points where the parameters of a time-heterogeneous STERGM have shifted. Experiments on both simulated and real data show good performance of the proposed framework, and an R package CPDstergm is developed to implement the method.

URL: https://openreview.net/forum?id=DSNJykzHF3

---

Title: Goal-Conditioned Data Augmentation for Offline Reinforcement Learning

Authors: Xingshuai Huang, Di Wu, Benoit Boulet

Abstract: Offline reinforcement learning (RL) enables policy learning from pre-collected offline datasets, relaxing the need to interact directly with the environment. However, limited by the quality of offline datasets, it generally fails to learn well-qualified policies in suboptimal datasets. To address datasets with insufficient optimal demonstrations, we introduce Goal-cOnditioned Data Augmentation (GODA), a novel goal-conditioned diffusion-based method for augmenting samples with higher quality. Leveraging recent advancements in generative modelling, GODA incorporates a novel return-oriented goal condition with various selection mechanisms. Specifically, we introduce a controllable scaling technique to provide enhanced return-based guidance during data sampling. GODA learns a comprehensive distribution representation of the original offline datasets while generating new data with selectively higher-return goals, thereby maximizing the utility of limited optimal demonstrations. Furthermore, we propose a novel adaptive gated conditioning method for processing noisy inputs and conditions, enhancing the capture of goal-oriented guidance. We conduct experiments on the D4RL benchmark and real-world challenges, specifically traffic signal control (TSC) tasks, to demonstrate GODA's effectiveness in enhancing data quality and superior performance compared to state-of-the-art data augmentation methods across various offline RL algorithms.

URL: https://openreview.net/forum?id=8K16dplpE0

---

Title: Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality

Authors: Shaocong Ma, Ziyi Chen, Yi Zhou, Heng Huang

Abstract: The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does not generally hold in robust constrained RL, indicating that traditional primal-dual methods may fail to find optimal feasible policies. To overcome this limitation, we propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO), which operates directly on the primal problem without relying on dual formulations. We provide theoretical convergence guarantees under mild regularity assumptions, showing convergence to an approximately optimal feasible policy with iteration complexity matching the best-known lower bound when the uncertainty set diameter is controlled in a specific level. Empirical results in a grid-world environment validate the effectiveness of our approach, demonstrating that RRPO achieves robust and safe performance under model uncertainties while the non-robust method can violate the worst-case safety constraints.

URL: https://openreview.net/forum?id=7l63xwAgAW

---

Title: G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving

Authors: Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon, Yuki Mitsufuji

Abstract: Recent literature has effectively leveraged diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete compressed representations, such as image and motion generation. However, their discrete and non-differentiable nature has limited their application to inverse problems formulated in continuous spaces. This paper presents a novel method for addressing linear inverse problems by leveraging generative models based on discrete diffusion as priors. We overcome these limitations by approximating the true posterior distribution with a variational distribution constructed from categorical distributions and continuous relaxation techniques. Furthermore, we employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states, demonstrating that our method performs comparably to continuous diffusion techniques with lower GPU memory consumption.

URL: https://openreview.net/forum?id=fj23qnVifX

---

Title: Mamba State-Space Models Are Lyapunov-Stable Learners

Authors: John Timothy Halloran, Manbir S Gulati, Paul F Roysdon

Abstract: Mamba state-space models (SSMs) have recently outperformed state-of-the-art (SOTA) Transformer large language models (LLMs) in various tasks and been widely adapted. However, a major concern for stable learning in recurrent-based deep models (such as SSMs) is the sensitivity of their recurrent dynamics. Despite widespread adaptation, the sensitivity of Mamba’s recurrent dynamics under common fine-tuning methods–e.g., mixed-precision fine-tuning (MPFT) and parameter-efficient fine-tuning (PEFT)–remains unexplored. Empirically,
we show that Mamba LLMs are extremely stable to changes introduced by combinations of MPFT and PEFT, in stark contrast to Transformer LLMs, which we demonstrate may drastically diverge from their respective full-precision counterparts under different
combinations of MPFT and PEFT (despite the near-ubiquitous adaptation of these fine-tuning frameworks for attention-based models). The demonstrated robustness of Mamba LLMs are due to their recurrent dynamics, which we prove are guaranteed to be stable using
dynamical systems theory (in particular, Lyapunov stability). We conclude by using MPFT and PEFT to novelly study Mamba LLMs’ in-context learning (ICL) abilities on natural language tasks, thus supplementing other recent work.

URL: https://openreview.net/forum?id=wzsYQYs3dO

---

New submissions
===============

Title: Beyond Expectations: Learning with Stochastic Dominance Made Practical

Abstract: Stochastic dominance serves as a general framework for modeling a broad spectrum of decision preferences under uncertainty, with risk aversion as one notable example, as it naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: i), the original concept of stochastic dominance only provides a partial order, therefore, is not amenable to serve as a general optimality criterion; and ii), an efficient computational recipe remains lacking due to the continuum nature of evaluating stochastic dominance.

In this work, we make the first attempt towards establishing a general framework of learning with stochastic dominance. We first generalize the stochastic dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We next develop a simple and computationally efficient approach for finding the optimal solution in terms of stochastic dominance, which can be seamlessly plugged into many learning tasks. Numerical experiments demonstrate that the proposed method achieves comparable performance as standard risk-neutral strategies and obtains better trade-offs against risk across a variety of applications including supervised learning, reinforcement learning, and portfolio optimization.

URL: https://openreview.net/forum?id=ebyPKXsweD

---

Title: Visual-Word Tokenizer: Beyond Fixed Sets of Tokens in Vision Transformers

Abstract: The cost of deploying vision transformers increasingly represents a barrier to wider industrial adoption. Existing compression techniques require additional end-to-end fine-tuning or incur a significant drawback to energy efficiency, making them ill-suited for online (real-time) inference, where a prediction is made on any new input as it comes in. We introduce the Visual Word Tokenizer (VWT), a training-free method for reducing energy costs while retaining performance. The VWT groups visual subwords (image patches) that are frequently used into visual words while infrequent ones remain intact. To do so, intra-image or inter-image statistics are leveraged to identify similar visual concepts for sequence compression. Experimentally, we demonstrate a reduction in energy consumed of up to 47%. Comparative approaches of 8-bit quantization and token merging can lead to significantly increased energy costs (up to 500% or more). Our results indicate that VWTs are well-suited for efficient online inference with a marginal compromise on performance.

URL: https://openreview.net/forum?id=YYOS1FHYG3

---

Title: Algorithms for the preordering problem and their application to the task of jointly clustering and ordering the accounts of a social network

Abstract: The NP-hard maximum value preordering problem is both a joint relaxation and a hybrid of the clique partition problem (a clustering problem) and the partial ordering problem. Toward approximate solutions and lower bounds, we introduce a linear-time 4-approximation algorithm that constructs a maximum dicut of a subgraph and define local search heuristics. Toward upper bounds, we tighten a linear program relaxation by the class of odd closed walk inequalities that define facets, as we show, of the preorder polytope. We contribute implementations of the algorithms, apply these to the task of jointly clustering and partially ordering the accounts of published social networks, and compare the output and efficiency qualitatively and quantitatively.

URL: https://openreview.net/forum?id=cBsUnv7Cb3

---

Title: InfGraND: An Influence-Guided GNN-to-MLP Knowledge Distillation

Abstract: Graph Neural Networks (GNNs) are the go-to model for graph data analysis. However, GNNs rely on two key operations—aggregation and update, which can pose challenges for low-latency inference tasks or resource-constrained scenarios. Simple Multi-Layer Perceptrons (MLPs) offer a computationally efficient alternative. Yet, training an MLP in a supervised setting often leads to suboptimal performance. Knowledge Distillation (KD) from a GNN teacher to an MLP student has emerged to bridge this gap. However, most KD methods either transfer knowledge uniformly across all nodes or rely on graph-agnostic indicators such as prediction uncertainty. We argue this overlooks a more fundamental, graph-centric inquiry: "How important is a node to the structure of the graph?" We introduce a framework, InfGraND, an Influence-guided Graph KNowledge Distillation from GNN to MLP that addresses this by identifying and prioritizing structurally influential nodes to guide the distillation process, ensuring that the MLP learns from the most critical parts of the graph. Additionally, InfGraND embeds structural awareness in MLPs through one-time multi-hop neighborhood feature pre-computation, which enriches the student MLP’s input and thus avoids inference-time overhead. Our rigorous evaluation in transductive and inductive settings across seven benchmark datasets shows InfGraND consistently outperforms prior GNN to MLP KD methods, demonstrating its practicality for numerous latency-critical applications in real-world settings.

URL: https://openreview.net/forum?id=lfzHR3YwlD

---

Title: Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment

Abstract: Building reliable speech systems often requires combining multiple modalities, like audio and visual cues. While such multimodal solutions frequently lead to improvements in performance and may even be critical in certain cases, they come with several constraints such as increased sensory requirements, computational cost, and modality synchronization, to mention a few. These challenges constrain the direct uses of these multimodal solutions in real-world applications. In this work, we develop approaches where the learning happens with all available modalities but the deployment or inference is done with just one or reduced modalities. To do so, we propose a Multimodal Training and Unimodal Deployment (MUTUD) framework which includes a Temporally Aligned Modality feature Estimation (TAME) module that can estimate information from missing modality using modalities present during inference. This innovative approach facilitates the integration of information across different modalities, enhancing the overall inference process by leveraging the strengths of each modality to compensate for the absence of certain modalities during inference. We apply MUTUD to various audiovisual speech tasks and show that it can reduce the performance gap between the multimodal and corresponding unimodal models to a considerable extent. MUTUD can achieve this while reducing the model size and compute compared to multimodal models, in some cases by almost 80%.

URL: https://openreview.net/forum?id=5bshBY8RDf

---

Title: DREAMS: Preserving both Local and Global Structure in Dimensionality Reduction

Abstract: Dimensionality reduction techniques are widely used for visualizing high-dimensional data in two dimensions. Existing methods are typically designed to preserve either local (e.g. $t$-SNE, UMAP) or global (e.g. MDS, PCA) structure of the data, but none of the established methods can represent both aspects well. In this paper, we present DREAMS (Dimensionality Reduction Enhanced Across Multiple Scales), a method that combines the local structure preservation of $t$-SNE with the global structure preservation of PCA via a simple regularization term. Our approach generates a spectrum of embeddings between the locally well-structured $t$-SNE embedding and the globally well-structured PCA embedding, efficiently balancing both local and global structure preservation. We benchmark DREAMS across seven real-world datasets, including five from single-cell transcriptomics and one from population genetics, showcasing qualitatively and quantitatively its superior ability to preserve structure across multiple scales compared to previous approaches.

URL: https://openreview.net/forum?id=xpGu3Sichc

---

Title: Improving Detection of Watermarked Language Models

Abstract: Watermarking has recently emerged as an effective strategy for detecting the generations of large language models (LLMs). The strength of a watermark typically depends strongly on the entropy afforded by the language model and the set of input prompts. However, entropy can be quite limited in practice, especially for models that are post-trained, for example via instruction tuning or reinforcement learning from human feedback (RLHF), which makes detection based on watermarking alone challenging. In this work, we investigate whether detection can be improved by combining watermark detectors with \emph{non-watermark} ones. We explore a number of \emph{hybrid} schemes that combine the two, observing performance gains over either class of detector under a wide range of experimental conditions.

URL: https://openreview.net/forum?id=6nmztBgngB

---

Title: Prompt Engineering Techniques for Language Model Reasoning Lack Replicability

Abstract: As large language models (LLMs) are integrated into everyday applications, research into prompt engineering techniques (PET) to improve these models’ behavior has surged. How- ever, clear methodological guidelines for evaluating these techniques are lacking. This raises concerns about the replicability and generalizability of the prompt engineering techniques’ benefits. We support our concerns with a series of replication experiments focused on zero- shot prompt engineering techniques purported to influence reasoning abilities in LLMs. We tested GPT-3.5, GPT-4o, Gemini 1.5 Pro, Claude 3 Opus, Llama 3, Vicuna, and BLOOM on the chain-of-thought, EmotionPrompting, Sandbagging, Re-Reading, Rephrase- and-Respond (RaR), and ExpertPrompting prompt engineering techniques. We applied them on manually double-checked subsets of reasoning benchmarks including Common- senseQA, CRT, NumGLUE, ScienceQA, and StrategyQA. Our findings reveal a general lack of statistically significant differences across nearly all techniques tested, highlighting, among others, several methodological weaknesses in previous research. To counter these issues, we propose recommendations for establishing sound benchmarks, and designing rigorous exper- imental frameworks to ensure accurate and reliable assessments of model outputs.

URL: https://openreview.net/forum?id=bgjR5bM44u

---

Title: AC$\oplus$DC search: behind the winning solution to the Flywire graph-matching challenge

Abstract: This paper describes the alternating continuous and discrete combinatorial (AC$\oplus$DC) optimizations behind the winning solution to the Flywire Ventral Nerve Cord Matching Challenge. The challenge was organized by the Princeton Neuroscience Institute and held over three months, ending on January 31, 2025. During this period, the challenge attracted teams of researchers with expertise in machine learning, high-performance computing, graph data mining, biological network analysis, and quadratic assignment problems. The goal of the challenge was to align the connectomes of a male and female fruit fly, and more specifically, to determine a one-to-one correspondence between the neurons in their ventral nerve cords. The connectomes were represented as large weighted graphs, and the challenge was posed as a problem in graph matching: how does one find a permutation that maps the nodes of one graph onto the nodes of another? The winning solution to the challenge alternated between two complementary approaches to graph matching—the first, a combinatorial optimization over the symmetric group of permutations, and the second, a continuous relaxation of this problem to the space of doubly stochastic matrices. For the latter, the doubly stochastic matrices were optimized by combining Frank-Wolfe methods with a fast preconditioner to solve the linear assignment problem at each iteration. We provide a complete implementation of these methods with a few hundred lines of code in MATLAB. Notably, this implementation obtains a winning score to the challenge in less than 15 minutes on a laptop computer.

URL: https://openreview.net/forum?id=8MjCOMyaDf

---

Title: The impact of imbalanced datasets on Deep Neural Network predictions: A case study in scramjet performance

Abstract: Robust aerodynamic predictions for hypersonic vehicles increasingly rely on existing deep‑learning tools. However, imbalanced datasets — often resulting from limited experimental data or insufficient coverage of operational conditions — can compromise model reliability and introduce bias into predictions. This work offers an application‑centered account of how a feed‑forward multilayer perceptron (PyTorch implementation) behaves when trained on (i) a data‑rich yet operationally imbalanced set of scramjet simulations and (ii) a deliberately balanced counterpart generated with a conventional metaheuristic (MH) sampling scheme, but with a lower sample count. Without altering network architecture, loss function, or optimizer, we expose a clear trade‑off: the imbalanced model achieves a 14% lower root mean square error (RMSE) but produces thrust predictions that violate first‑principles trends, whereas the balanced model sacrifices a small amount of numerical accuracy to maintain physical coherence across Mach–altitude space. These results illuminate both the strength (high statistical accuracy) and the weakness (loss of physical fidelity under bias) of off‑the‑shelf deep neural networks (DNNs) when data coverage is uneven. The findings serve as a cautionary example for practitioners who might otherwise deploy such models uncritically, and underscore the methodological importance of rigorous dataset diagnostics — rather than chasing novel algorithms — for reliable AI adoption in aerospace design.

URL: https://openreview.net/forum?id=0kHvatNcGB

---

Reply all

Reply to author

Forward

0 new messages