Daily TMLR digest for Jan 30, 2026

3 views

Skip to first unread message

TMLR

unread,

Jan 30, 2026, 12:30:05 AM (13 days ago) Jan 30

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Algorithmic Recourse in Abnormal Multivariate Time Series

Authors: Xiao Han, Lu Zhang, Yongkai Wu, Shuhan Yuan

Abstract: Algorithmic recourse provides actionable recommendations to alter unfavorable predictions of machine learning models, enhancing transparency through counterfactual explanations. While significant progress has been made in algorithmic recourse for static data, such as tabular and image data, limited research explores recourse for multivariate time series, particularly for reversing abnormal time series. This paper introduces Recourse in time series Anomaly Detection (RecAD), a framework for addressing anomalies in multivariate time series using backtracking counterfactual reasoning. By modeling the causes of anomalies as external interventions on exogenous variables, RecAD predicts recourse actions to restore normal status as counterfactual explanations, where the recourse function, responsible for generating actions based on observed data, is trained using an end-to-end approach. Experiments on synthetic and real-world datasets demonstrate its effectiveness.

URL: https://openreview.net/forum?id=kzxFc2Suo5

---

Title: Constant Rate Scheduling: A General Framework for Optimizing Diffusion Noise Schedule via Distributional Change

Authors: Shuntaro Okada, Kenji Doi, Ryota Yoshihashi, Hirokatsu Kataoka, Tomohiro Tanaka

Abstract: We propose a general framework for optimizing noise schedules in diffusion models, applicable to both training and sampling.
Our method enforces a constant rate of change in the probability distribution of diffused data throughout the diffusion process,
where the rate of change is quantified using a user-defined discrepancy measure.
We introduce three such measures, which can be flexibly selected or combined depending on the domain and model architecture.
While our framework is inspired by theoretical insights, we do not aim to provide a complete theoretical justification of how distributional change affects sample quality.
Instead, we focus on establishing a general-purpose scheduling framework and validating its empirical effectiveness.
Through extensive experiments, we demonstrate that our approach consistently improves the performance of both pixel-space and latent-space diffusion models,
across various datasets, samplers, and a wide range of number of function evaluations from 5 to 250.
In particular, when applied to both training and sampling schedules, our method achieves a state-of-the-art FID score of 2.03 on LSUN Horse 256$\times$256, without compromising mode coverage.

URL: https://openreview.net/forum?id=Pjq6kdvMBj

---

Title: CacheFlow: Fast Human Motion Prediction by Cached Normalizing Flow

Authors: Takahiro Maeda, Jinkun Cao, Norimichi Ukita, Kris Kitani

Abstract: Many density estimation techniques for 3D human motion prediction require a significant amount of inference time, often exceeding the duration of the predicted time horizon. To address the need for faster density estimation for 3D human motion prediction, we introduce a novel flow-based method for human motion prediction called CacheFlow. Unlike previous conditional generative models that suffer from poor time efficiency, CacheFlow takes advantage of an unconditional flow-based generative model that transforms a Gaussian mixture into the density of future motions. The results of the computation of the flow-based generative model can be precomputed and cached. Then, for conditional prediction, we seek a mapping from historical trajectories to samples in the Gaussian mixture. This mapping can be done by a much more lightweight model, thus saving significant computation overhead compared to a typical conditional flow model. In such a two-stage fashion and by caching results from the slow flow model computation, we build our CacheFlow without loss of prediction accuracy and model expressiveness. This inference process is completed in approximately one millisecond, making it 4$\times$ faster than previous VAE methods and 30$\times$ faster than previous diffusion-based methods on standard benchmarks such as Human3.6M and AMASS datasets. Furthermore, our method demonstrates improved density estimation accuracy and comparable prediction accuracy to a SOTA method on Human3.6M. Our code and models are available at \url{https://github.com/meaten/CacheFlow}.

URL: https://openreview.net/forum?id=icq5659pQt

---

Title: ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer

Authors: Jinyi Hu, Shengding Hu, Yuxuan Song, Yufei Huang, Mingxuan Wang, Hao Zhou, Zhiyuan Liu, Wei-Ying Ma, Maosong Sun

Abstract: Autoregressive and diffusion models have achieved remarkable progress in language models and visual generation, respectively. We present ACDiT, a novel Autoregressive blockwise Conditional Diffusion Transformer, that innovatively combines autoregressive and diffusion paradigms for continuous visual information. By introducing a block-wise autoregressive unit, ACDiT offers a flexible interpolation between token-wise autoregression and full-sequence diffusion, bypassing the limitations of discrete tokenization. The generation of each block is formulated as a conditional diffusion process, conditioned on prior blocks. ACDiT is easy to implement, as simple as applying a specially designed Skip-Causal Attention Mask on the standard diffusion transformer during training. During inference, the process iterates between diffusion denoising and autoregressive decoding that can make full use of KV-Cache. We validate the effectiveness of ACDiT on image, video, and text generation and show that ACDiT performs best among all autoregressive baselines under similar model scales on visual generation tasks. We also demonstrate that, benefiting from autoregressive modeling, pretrained ACDiT can be transferred in visual understanding tasks despite being trained with the generative objective. The analysis of the trade-off between autoregressive and diffusion demonstrates the potential of ACDiT to be used in long-horizon visual generation tasks. We hope that ACDiT offers a novel perspective on visual autoregressive generation and sheds light on new avenues for unified models.

URL: https://openreview.net/forum?id=OuFNXESoCO

---

New submissions
===============

Title: Networked Communication for Decentralised Agents in Mean-Field Games

Abstract: Methods like multi-agent reinforcement learning struggle to scale with growing population size. Mean-field games (MFGs) are a game-theoretic approach that can circumvent this by finding a solution for an abstract infinite population, which can then be used as an approximate solution for the $N$-agent problem. However, classical mean-field algorithms usually only work under restrictive conditions. We take steps to address this by introducing networked communication to MFGs, in particular to settings that use a single, non-episodic run of $N$ decentralised agents to simulate the infinite population, as is likely to be most reasonable in real-world deployments. We prove that our architecture's sample guarantees lie between those of earlier theoretical algorithms for the centralised- and independent-learning architectures, varying dependent on network structure and the number of communication rounds. However, the sample guarantees of the three theoretical algorithms do not actually result in practical convergence times. We thus contribute practical enhancements to all three algorithms allowing us to present their first empirical demonstrations. We then show that in practical settings where the theoretical hyperparameters are not observed, giving fewer loops but poorer estimation of the Q-function, our communication scheme still respects the earlier theoretical comparison: it considerably accelerates learning over the independent case, which hardly seems to learn at all, and often performs similarly to the centralised case, while removing the restrictive assumption of the latter. We provide ablations and additional studies showing that our networked approach also has advantages over both alternatives in terms of robustness to update failures and to changes in population size.

URL: https://openreview.net/forum?id=7ALoJiEbO2

---

Title: Decoupling Planning from Control: Stable Hierarchical RL with a Learned Metric Space

Abstract: Hierarchical Reinforcement Learning (HRL) offers a promising framework for solving complex, long-horizon tasks by decomposing them into manageable subproblems. However, conventional HRL methods suffer from a critical non-stationarity problem: the high-level planner's learning process is destabilized because the low-level policy is concurrently learning and constantly changing. This issue is particularly severe in resource-constrained systems, such as edge-cloud robotics, where the low-level controller must be a computationally simple, low-capacity model.
To address this challenge, we propose a novel HRL framework that resolves the non-stationarity issue by decoupling high-level planning from low-level control. The core of our approach is to reframe the planner's task: instead of learning the planner via RL on non-stationary transitions, it learns to navigate a stable "map" of the environment. This map is represented by a critic network trained to function as a metric space, where distances reflect optimal travel costs. Planning is then simplified to finding optimal subgoals that lie along the shortest path (geodesic) between the current state and the final goal. To further improve the accuracy of this map, we introduce a novel trajectory regularization loss that enforces geometric consistency along the agent's experienced trajectories.
Experiments demonstrate that our decoupled framework is highly robust. In scenarios with resource-constrained low-level policies, our method learns to solve complex tasks effectively where standard approaches fail. This result highlights our framework's suitability for real-world systems where low-level controllers have inherently limited computational capacity.

URL: https://openreview.net/forum?id=Kmtlv8X0BN

---

Title: Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation

Abstract: Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning (RL)–based fine-tuning is a key mechanism for improvement, but its effectiveness is fundamentally governed by reward design. Despite its importance, the relationship between reward modeling and core LLM challenges—such as evaluation bias, hallucination, distribution shift, and efficient learning—remains poorly understood. This survey argues that reward modeling is not merely an implementation detail but a central architect of reasoning alignment, shaping what models learn, how they generalize, and whether their outputs can be trusted. We introduce Reasoning-Aligned Reinforcement Learning (RARL), a unifying framework that systematizes diverse reward paradigms for multi-step reasoning. Within this framework, we present a taxonomy of reward mechanisms, analyze reward hacking as a pervasive failure mode, and examine how reward signals unify challenges ranging from inference-time scaling to hallucination mitigation. We further critically evaluate existing benchmarks, highlighting vulnerabilities such as data contamination and reward misalignment, and outline directions for more robust evaluation. By integrating fragmented research threads and clarifying the interplay between reward design and fundamental reasoning capabilities, this survey provides a foundational roadmap for building reasoning models that are robust, verifiable, and trustworthy.

URL: https://openreview.net/forum?id=TDfrN1TbGH

---

Reply all

Reply to author

Forward

0 new messages