Daily TMLR digest for Dec 07, 2025

1 view
Skip to first unread message

TMLR

unread,
Dec 7, 2025, 12:30:06 AMDec 7
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: Causal Ordering for Structure Learning from Time Series

Authors: Pedro Sanchez, Damian Machlanski, Steven McDonagh, Sotirios A. Tsaftaris

Abstract: Predicting causal structure from time series data is crucial for understanding complex phenomena in physiology, brain connectivity, climate dynamics, and socio-economic behaviour. Causal discovery in time series is hindered by the combinatorial complexity of identifying true causal relationships, especially as the number of variables and time points grows. A common approach to simplify the task is the so-called ordering-based methods. Traditional ordering methods inherently limit the representational capacity of the resulting model. In this work, we fix this issue by leveraging multiple valid causal orderings, instead of a single one as standard practice. We propose DOTS (Diffusion Ordered Temporal Structure), using diffusion-based causal discovery for temporal data. By integrating multiple orderings, DOTS effectively recovers the transitive closure of the underlying directed acyclic graph (DAG), mitigating spurious artifacts inherent in single-ordering approaches. We formalise the problem under standard assumptions such as stationarity and the additive noise model, and leverage score matching with diffusion processes to enable efficient Hessian estimation. Extensive experiments validate the approach. Empirical evaluations on synthetic and real-world datasets demonstrate that DOTS outperforms state-of-the-art baselines, offering a scalable and robust approach to temporal causal discovery. On synthetic benchmarks spanning $d{=}3{-}6$ variables, $T{=}200{-}5{,}000$ samples and up to three lags, DOTS improves mean window‑graph $F1$ from $0.63$ (best baseline) to $0.81$. On the CausalTime real‑world benchmark (Medical, AQI, Traffic; $d{=}20{-}36$), while baselines remain the best on individual datasets, DOTS attains the highest average summary‑graph $F1$ while halving runtime relative to graph‑optimisation methods. These results establish DOTS as a scalable and accurate solution for temporal causal discovery. Code is available at \url{https://github.com/CHAI-UK/DOTS}.

URL: https://openreview.net/forum?id=hWuTzqggSd

---

Title: The Initialization Determines Whether In-Context Learning Is Gradient Descent

Authors: Shifeng Xie, Rui Yuan, Simone Rossi, Thomas Hannagan

Abstract: In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attention (LSA) to gradient descent (GD), this connection has primarily been established under simplified conditions with zero-mean Gaussian priors and zero initialization for GD. However, subsequent studies have challenged this simplified view by highlighting its overly restrictive assumptions, demonstrating instead that under conditions such as multi-layer or nonlinear attention, self-attention performs optimization-like inference, akin to but distinct from GD. We investigate how multi-head LSA approximates GD under more realistic conditions—specifically when incorporating non-zero Gaussian prior means in linear regression formulations of ICL. We first extend multi-head LSA embedding matrix by introducing an initial estimation of the query, referred to as the initial guess. We prove an upper bound on the number of heads needed for ICL linear regression setup. Our experiments confirm this result and further observe that a performance gap between one-step GD and multi-head LSA persists. To address this gap, we introduce $y_q$-LSA, a simple generalization of single-head LSA with a trainable initial guess $y_q$. We theoretically establish the capabilities of $y_q$-LSA and provide experimental validation on linear regression tasks, thereby extending the theory that bridges ICL and GD. Finally, inspired by our findings in the case of linear regression, we consider widespread LLMs augmented with initial guess capabilities, and show that their performance is improved on a semantic similarity task.

URL: https://openreview.net/forum?id=fvqSKLDtJi

---

Title: Towards shutdownable agents via stochastic choice

Authors: Elliott Thornley, Alexander Roman, Christos Ziakas, Louis Thomson, Leyton Ho

Abstract: The POST-Agents Proposal (PAP) is an idea for ensuring that advanced artificial agents never resist shutdown. A key part of the PAP is using a novel ‘Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be 'USEFUL'), and (2) choose stochastically between different trajectory-lengths (be NEUTRAL' about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DReST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus provide some initial evidence that DReST reward functions could train advanced agents to be USEFUL and NEUTRAL. Our theoretical work suggests that these agents would be useful and shutdownable.

URL: https://openreview.net/forum?id=j5Qv7KdWBn

---


New submissions
===============


Title: VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Abstract: Training Vision-Language Models (VLMs) for Graphical User Interfaces (GUI) agents via Reinforcement Learning (RL) faces critical challenges: environment-based RL requires costly interactions, while environment-free methods struggle with distribution shift and reward generalization. We propose an environment-free RL framework that decouples value estimation from policy optimization by leveraging a pretrained Value Environment Model (VEM), which requires no live environment interaction during policy optimization. VEM predicts state-action values directly from offline data, distilling human-like priors about GUI interaction outcomes without requiring next-state prediction or environmental feedback. This avoids compounding errors and enhances resilience to UI changes by focusing on semantic reasoning (e.g., “Does this action advance the user’s goal?”). The framework operates in two stages: (1) pretraining VEM to estimate long-term action utilities and (2) guiding policy exploration with frozen VEM signals, enabling layout-agnostic GUI automation. Evaluated across diverse benchmarks including Android-in-the-Wild for mobile apps and Multimodal-Mind2Web for web environments, VEM achieves state-of-the-art or highly competitive performance in both offline and online settings. It significantly outperforms environment-free baselines and matches or exceeds environment-based approaches, crucially without incurring interaction costs. Importantly, VEM demonstrates that robust, generalizable GUI agents can be trained efficiently using semantic-aware value estimation, proving effective across distinct interaction platforms like mobile and web. The code is available at https://anonymous.4open.science/r/VEM-Agent-51E7.

URL: https://openreview.net/forum?id=q1wLUxaBPn

---

Title: GroundingBooth: Grounding Text-to-Image Customization

Abstract: Recent approaches in text-to-image customization have primarily focused on preserving the identity of the input subject, but often fail to control the spatial location and size of objects. We introduce GroundingBooth, which achieves zero-shot, instance-level spatial grounding on both foreground subjects and background objects in the text-to-image customization task. Our proposed grounding module and subject-grounded cross-attention layer enable the creation of personalized images with accurate layout alignment, identity preservation, and strong text-image coherence. In addition, our model seamlessly supports personalization with multiple subjects. Our model shows strong results in both layout-guided image synthesis and text-to-image customization tasks. The code link will be provided upon acceptance.

URL: https://openreview.net/forum?id=TRlZpHU300

---

Title: Diffusion Models are Secretly Zero-Shot 3DGS Harmonizers

Abstract: Gaussian Splatting has become a popular technique for various 3D Computer Vision tasks, including novel view synthesis, scene reconstruction, and dynamic scene rendering. However, the challenge of natural-looking object insertion, where the object's appearance seamlessly matches the scene, remains unsolved. In this work, we propose a method, dubbed D3DR, for inserting a 3DGS-parametrized object into a 3DGS scene while correcting its lighting, shadows, and other visual artifacts to ensure consistency. We reveal a hidden ability of diffusion models trained on large real-world datasets to implicitly understand correct scene lighting, and leverage it in our pipeline. After inserting the object, we optimize a diffusion-based Delta Denoising Score (DDS)-inspired objective to adjust its 3D Gaussian parameters for proper lighting correction. We introduce a novel diffusion personalization technique that preserves object geometry and texture across diverse lighting conditions, and utilize it to achieve consistent identity matching between original and inserted objects. Finally, we demonstrate the effectiveness of the method by comparing it to existing approaches, achieving 2.0 dB PSNR improvements in relighting quality.

URL: https://openreview.net/forum?id=1jjIitxVmM

---

Title: Inference-Time Alignment via Hypothesis Reweighting

Abstract: Chat assistants must handle diverse and often conflicting user preferences, requiring adaptability to various user needs. We propose Hypothesis Reweighting (HyRe), a method that enables real-time personalization by reweighting ensemble members based on just 1-5 labeled examples from the target user or domain. Our method builds on the key empirical observation that optimally weighting ensemble members substantially outperforms uniform averaging under distribution shift, providing a powerful inductive bias for personalization. HyRe trains a single network with multiple prediction heads that capture different valid interpretations of preference data, then performs a simple Bayesian update to upweight heads that best match the target user's preferences. This requires only a single forward pass with negligible (<1\%) computational overhead, making it practical for inference-time alignment. We empirically validate HyRe in several target evaluation distributions. With as few as five preference pairs from each target distribution, adaptation via HyRe surpasses state-of-the-art reward models on RewardBench at both the 2B and 8B parameter scales, and improves reward model accuracy by 20\% across 32 diverse personalization tasks.

URL: https://openreview.net/forum?id=Q9p8LSEpiJ

---

Reply all
Reply to author
Forward
0 new messages