Daily TMLR digest for Dec 19, 2025

3 views

Skip to first unread message

TMLR

unread,

Dec 19, 2025, 12:30:07 AM12/19/25

to tmlr-anno...@googlegroups.com

New certifications
==================

J2C Certification: TimeAutoDiff: A Unified Framework for Generation, Imputation, Forecasting, and Time-Varying Metadata Conditioning of Heterogeneous Time Series Tabular Data

Namjoon Suh, Yuning Yang, Din-Yin Hsieh, Qitong Luan, Shirong Xu, Shixiang Zhu, Guang Cheng

https://openreview.net/forum?id=bkUd1Dg46c

---

Accepted papers
===============

Title: STLDM: Spatio-Temporal Latent Diffusion Model for Precipitation Nowcasting

Authors: Shi Quan Foo, Chi-Ho Wong, Zhihan Gao, Dit-Yan Yeung, Ka-Hing Wong, Wai-Kin Wong

Abstract: Precipitation nowcasting is a critical spatio-temporal prediction task for society to prevent severe damage owing to extreme weather events. Despite the advances in this field, the complex and stochastic nature of this task still poses challenges to existing approaches. Specifically, deterministic models tend to produce blurry predictions while generative models often struggle with poor accuracy. In this paper, we present a simple yet effective model architecture termed STLDM, a diffusion-based model that learns the latent representation from end to end alongside both the Variational Autoencoder and the conditioning network. STLDM decomposes this task into two stages: a deterministic forecasting stage handled by the conditioning network, and an enhancement stage performed by the latent diffusion model. Experimental results on multiple radar datasets demonstrate that STLDM achieves superior performance compared to the state of the art, while also improving inference efficiency. The code is available in https://github.com/sqfoo/stldm_official.

URL: https://openreview.net/forum?id=f4oJwXn3qg

---

Title: Demystifying amortized causal discovery with transformers

Authors: Francesco Montagna, Max Cairney-Leeming, Dhanya Sridhar, Francesco Locatello

Abstract: Supervised learning for causal discovery from observational data often achieves competitive performance despite seemingly avoiding the explicit assumptions that traditional methods require for identifiability. In this work, we analyze CSIvA (Ke et al., 2023) on bivariate causal models, a transformer architecture for amortized inference promising to train on synthetic data and transfer to real ones. First, we bridge the gap with identifiability theory, showing that the training distribution implicitly defines a prior on the causal model of the test observations: consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. Second, we find that CSIvA can not generalize to classes of causal models unseen during training: to overcome this limitation, we theoretically and empirically analyze \textit{when} training CSIvA on datasets generated by multiple identifiable causal models with different structural assumptions improves its generalization at test time. Overall, we find that amortized causal discovery still adheres to identifiability theory, violating the previous hypothesis from Lopez-Paz et al. (2015) that supervised learning methods could overcome its restrictions.

URL: https://openreview.net/forum?id=9Lgy7IGSfp

---

Title: TimeAutoDiff: A Unified Framework for Generation, Imputation, Forecasting, and Time-Varying Metadata Conditioning of Heterogeneous Time Series Tabular Data

Authors: Namjoon Suh, Yuning Yang, Din-Yin Hsieh, Qitong Luan, Shirong Xu, Shixiang Zhu, Guang Cheng

Abstract: We present \texttt{TimeAutoDiff}, a unified latent-diffusion framework that addresses four fundamental time-series tasks—unconditional generation, missing-data imputation, forecasting, and time-varying-metadata conditional generation—within a single model that natively handles heterogeneous features (continuous, binary, and categorical). We unify these tasks through a simple masked-modeling strategy: a binary mask specifies which time feature cells are observed and which must be generated. To make this work on mixed data types, we pair a lightweight variational autoencoder (i.e., VAE)—which maps continuous, categorical, and binary variables into a continuous latent sequence—with a diffusion model that learns dynamics in that latent space, avoiding separate likelihoods for each data type while still capturing temporal and cross-feature structure.Two design choices give \texttt{TimeAutoDiff} clear speed and scalability advantages. First, the diffusion process samples a single latent trajectory for the full time horizon rather than denoising one timestep at a time; this whole-sequence sampling drastically reduces reverse-diffusion calls and yields an order-of-magnitude throughput gain. Second, the VAE compresses along the feature axis, so very wide tables are modeled in a lower-dimensional latent space, further reducing computational load. Empirical evaluation demonstrates that \texttt{TimeAutoDiff} matches or surpasses strong baselines in synthetic sequence fidelity (discriminative, temporal-correlation, and predictive metrics) and consistently lowers MAE/MSE for imputation and forecasting tasks. Time-varying metadata conditioning unlocks real-world scenario exploration: by editing metadata sequences, practitioners can generate coherent families of counterfactual trajectories that track intended directional changes, preserve cross-feature dependencies, and remain conditionally calibrated—making "what-if" analysis practical. Our ablation studies confirm that performance is impacted by key architectural choices, such as the VAE's continuous feature encoding and specific components of the DDPM denoiser. Furthermore, a distance-to-closest-record (DCR) audit demonstrates that the model achieves generalization with limited memorization given enough dataset. Code implementations of \texttt{TimeAutoDiff} are provided in https://github.com/namjoonsuh/TimeAutoDiff.

URL: https://openreview.net/forum?id=bkUd1Dg46c

---

New submissions
===============

Title: A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-time Adaptation

Abstract: Open-set test-time adaptation (TTA) updates models on new data in the presence of input shifts and unknown output classes. While recent methods have made progress on improving in-distribution (InD) accuracy for known classes, their ability to accurately detect out-of-distribution (OOD) unknown classes remains underexplored. We benchmark robust and open-set TTA methods (SAR, OSTTA, UniEnt, and SoTTA) on the standard corruption benchmarks of CIFAR-10-C at the small scale and ImageNet-C at the large scale. For CIFAR-10-C, we use OOD data from SVHN and CIFAR-100 in their respective corrupted forms of SVHN-C and CIFAR-100-C. For ImageNet-C, we use OOD data from ImageNet-O and Textures in their respective corrupted forms of ImageNet-O-C and Textures-C. ImageNet-O is nearer to ImageNet, as unknown but related object classes (like ``garlic bread'' vs. ``hot dog'' for food, or ``highway'' vs. ``dam'' for infrastructure), while Textures is farther from ImageNet, as non-object patterns (like ``cracked'' mud, ``porous'' sponge, ``veined'' leaves). We evaluate the accuracy and confidence of TTA methods for InD vs. OOD recognition on CIFAR-10-C and ImageNet-C. We verify the accuracy of each method's own OOD detection technique on CIFAR-10-C. We also evaluate on ImageNet-C and report both accuracy and standard OOD detection metrics. We further examine more realistic settings, in which the proportions and rates of OOD data can vary. To explore the trade-off between InD recognition and OOD rejection, we propose a new baseline that replaces softmax/multi-class output with sigmoid/multi-label output. Our analysis shows for the first time that current open-set TTA methods struggle to balance InD and OOD accuracy and that they only imperfectly filter OOD data for their own adaptation updates.

URL: https://openreview.net/forum?id=4MuLx2YDmi

---

Title: Rethinking Coreset Selection: The Surprising Effectiveness of Soft Labels

Abstract: Data-efficient deep learning is an emerging and powerful branch of deep learning that focuses on minimizing the amount of labeled data required for training. Coreset selection is one such method, where the goal is to select a representative subset from the original dataset, which can achieve comparable generalization performance at a much lower computation and disk space overhead. Dataset Distillation (DD), another branch of data-efficient deep learning, achieves this goal through distilling a small synthetic dataset from the original dataset. While DD works exploit soft labels (probabilistic target labels instead of traditional one-hot labels), which have yielded significant improvement over hard labels, to the best of our knowledge, no such study exists for coreset selection. In this work, for the first time, we
study the impact of soft labels on generalization accuracy for the image classification task for various coreset selection algorithms. While soft labels improve the performance of all the methods, surprisingly, random selection with soft labels performs on par or better than existing coreset selection approaches. Our findings suggest that future coreset algorithms should benchmark against random selection with soft labels as an important baseline.

URL: https://openreview.net/forum?id=Ll78kAR1lj

---

Title: Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It

Abstract: The success of federated learning (FL) ultimately depends on how strategic participants behave
under partial observability, yet most formulations still treat FL as a static optimization
problem. We instead view FL deployments as governed strategic systems and develop an analytical
framework that separates welfare-improving behavior from metric gaming. Within
this framework, we introduce indices that quantify manipulability, the price of gaming, and
the price of cooperation, and we use them to study how rules, information disclosure, evaluation
metrics, and aggregator-switching policies reshape incentives and cooperation patterns.
We derive threshold conditions for deterring harmful gaming while preserving benign cooperation,
and for triggering auto-switch rules when early-warning indicators become critical.
Building on these results, we construct a design toolkit including a governance checklist and
a simple audit-budget allocation algorithm with a provable performance guarantee. Simulations
across diverse stylized environments and a federated learning case study consistently
match the qualitative and quantitative patterns predicted by our framework. Taken together,
our results provide design principles and operational guidelines for reducing metric
gaming while sustaining stable, high-welfare cooperation in FL platforms.

URL: https://openreview.net/forum?id=Ck3q5YdWIv

---

Title: RIGID: A Training-Free and Model-Agnostic Framework for Robust AI-Generated Image Detection

Abstract: The rapid advances in generative AI models have empowered the creation of highly realistic images with arbitrary content, raising concerns about potential misuse and harm, such as Deepfakes. Current research focuses on training detectors using large datasets of generated images. However, these training-based solutions are often computationally expensive and show limited generalization to unseen generated images. In this paper, we propose a training-free method to distinguish between real and AI-generated images. We first observe that real images are more robust to tiny noise perturbations than AI-generated images in the representation space of vision foundation models. Based on this observation, we propose RIGID, a training-free and model-agnostic method for robust AI-generated image detection. RIGID is a simple yet effective approach that identifies whether an image is AI-generated by comparing the representation similarity between the original and the noise-perturbed counterpart. Our comprehensive evaluation demonstrates RIGID’s exceptional performance. RIGID surpasses existing training-free detectors by more than 25% on average. Remarkably, RIGID performs comparably to training-based methods, particularly on unseen domain data. Additionally, RIGID maintains consistent performance across various image generation techniques and demonstrates strong resilience to common image corruptions.

URL: https://openreview.net/forum?id=NBkBI2Zjlm

---

Title: When Lifelong Novelty Fails: Coordination Breakdown in Decentralised MARL

Abstract: Lifelong novelty bonuses are a cornerstone of exploration in reinforcement learning, but we identify a critical failure mode when they are applied to decentralised multi-agent coordination tasks: \emph{coordination de-synchronisation}. In sequential coordination tasks with multiple joint coordination checkpoints (states that all agents must occupy simultaneously), agents searching for later checkpoints must repeatedly traverse earlier ones. Under lifelong novelty, this repeated traversal gradually depletes intrinsic motivation to revisit these critical locations and can destabilise coordination. Within a stylised analytical framework, we derive lower bounds showing that the \emph{guaranteed} success probability under a lifelong novelty scheme can shrink polynomially with a problem-dependent geometric \emph{revisit pressure} and the number of agents, whereas episodic bonuses, which reset at the start of each episode, provide a time-uniform lower bound on the probability of reaching a given checkpoint. We further prove that a hybrid scheme, which multiplicatively combines episodic and lifelong bonuses, inherits both a constant ``coordination floor'' at known checkpoints and a persistent drive to discover previously unseen states. We validate the qualitative predictions of this framework in GridWorld, Overcooked, and StarCraft~II, where hybrid bonuses yield substantially more reliable coordination than lifelong-only exploration in environments with multiple sequential checkpoints or narrow geometric bottlenecks, such as corridors that force agents to pass through the same cells many times. Together, these results provide a theoretical and empirical account of when different intrinsic motivation schemes are effective in decentralised multi-agent coordination.

URL: https://openreview.net/forum?id=xOPjPFTuvy

---

Title: V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions

Abstract: Ensuring safety in autonomous systems requires controllers that satisfy hard, state-wise constraints without relying on online interaction. While existing Safe Offline RL methods typically enforce soft expected-cost constraints, they do not guarantee forward invariance. Conversely, Control Barrier Functions (CBFs) provide rigorous safety guarantees but usually depend on expert-designed barrier functions or full knowledge of the system dynamics. We introduce Value-Guided Offline Control Barrier Functions (V-OCBF), a framework that learns a neural CBF entirely from offline demonstrations. Unlike prior approaches, V-OCBF does not assume access to the dynamics model; instead, it derives a recursive finite-difference barrier update, enabling model-free learning of a barrier that propagates safety information over time. Moreover, V-OCBF incorporates an expectile-based objective that avoids querying the barrier on out-of-distribution actions and restricts updates to the dataset-supported action set. The learned barrier is then used with a Quadratic Program (QP) formulation to synthesize real-time safe control. Across multiple case studies, V-OCBF yields substantially fewer safety violations than baseline methods while maintaining strong task performance, highlighting its scalability for offline synthesis of safety-critical controllers without online interaction or hand-engineered barriers.

URL: https://openreview.net/forum?id=PGO9mpIyyb

---

Title: Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning

Abstract: While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interpret when evaluated only through performance metrics. In particular, it is poorly understood which input features agents rely on, how these dependencies evolve during training, and how they relate to behavior. We introduce a scientific methodology for analyzing the learning process through quantitative analysis of saliency. This approach aggregates saliency information at the object and modality level into hierarchical attention profiles, quantifying how agents allocate attention over time, thereby forming attention trajectories throughout training. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biomechanical user simulations in visuomotor interactive tasks, this methodology uncovers algorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnoses overfitting to redundant sensory channels. These patterns correspond to measurable behavioral differences, demonstrating empirical links between attention profiles, learning dynamics, and agent behavior. To assess robustness of the attention profiles, we validate our findings across multiple saliency methods and environments. The results establish attention trajectories as a promising diagnostic axis for tracing how feature reliance develops during training and for identifying biases and vulnerabilities invisible to performance metrics alone.

URL: https://openreview.net/forum?id=0aa9zthk7k

---

Reply all

Reply to author

Forward

0 new messages