Daily TMLR digest for Sep 03, 2025

1 view

Skip to first unread message

TMLR

unread,

Sep 3, 2025, 12:06:07 AM (5 days ago) Sep 3

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: MoReact: Generating Reactive Motion from Textual Descriptions

Authors: Xiyan Xu, Sirui Xu, Yu-Xiong Wang, Liangyan Gui

Abstract: Modeling and generating human reactions poses a significant challenge with broad applications for computer vision and human-computer interaction. Existing methods either treat multiple individuals as a single entity, directly generating interactions, or rely solely on one person's motion to generate the other's reaction, failing to integrate the rich semantic information that underpins human interactions. Yet, these methods often fall short in adaptive responsiveness, \ie, the ability to accurately respond to diverse and dynamic interaction scenarios. Recognizing this gap, our work introduces an approach tailored to address the limitations of existing models by focusing on text-driven human reaction generation. Our model specifically generates realistic motion sequences for individuals that responding to the other's actions based on a descriptive text of the interaction scenario. The goal is to produce motion sequences that not only complement the opponent's movements but also semantically fit the described interactions. To achieve this, we present MoReact, a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially. This approach stems from the observation that generating global trajectories first is crucial for guiding local motion, ensuring better alignment with given action and text. Furthermore, we introduce a novel interaction loss to enhance the realism of generated close interactions. Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach for this novel task, which is capable of producing realistic, diverse, and controllable reactions that not only closely match the movements of the counterpart but also adhere to the textual guidance. Please find our webpage at https://xiyan-xu.github.io/MoReactWebPage.

URL: https://openreview.net/forum?id=4zuT73heqm

---

Title: SELU: Self-Learning Embodied Multimodal Large Language Models in Unknown Environments

Authors: Boyu Li, Haobin Jiang, Ziluo Ding, Xinrun Xu, Haoran Li, Dongbin Zhao, Zongqing Lu

Abstract: Recently, multimodal large language models (MLLMs) have demonstrated strong visual understanding and decision-making capabilities, enabling the exploration of autonomously improving MLLMs in unknown environments. However, external feedback like human or environmental feedback is not always available. To address this challenge, existing methods primarily focus on enhancing the decision-making capabilities of MLLMs through voting and scoring mechanisms, while little effort has been paid to improving the environmental comprehension of MLLMs in unknown environments. To fully unleash the self-learning potential of MLLMs, we propose a novel actor-critic self-learning paradigm, dubbed SELU, inspired by the actor-critic paradigm in reinforcement learning. The critic employs self-asking and hindsight relabeling to extract knowledge from interaction trajectories collected by the actor, thereby augmenting its environmental comprehension. Simultaneously, the actor is improved by the self-feedback provided by the critic, enhancing its decision-making. We evaluate our method in the AI2-THOR and VirtualHome environments, and SELU achieves critic improvements of approximately 28% and 30%, and actor improvements of about 20% and 24% via self-learning.

URL: https://openreview.net/forum?id=G5gROx8AVi

---

Title: Behaviour Discovery and Attribution for Explainable Reinforcement Learning

Authors: Rishav Rishav, Somjit Nath, Vincent Michalski, Samira Ebrahimi Kahou

Abstract: Building trust in reinforcement learning (RL) agents requires understanding why they make certain decisions, especially in high-stakes applications like robotics, healthcare, and finance. Existing explainability methods often focus on single states or entire trajectories, either providing only local, step-wise insights or attributing decisions to coarse, episodelevel summaries. Both approaches miss the recurring strategies and temporally extended patterns that actually drive agent behavior across multiple decisions. We address this gap by proposing a fully offline, reward-free framework for behavior discovery and segmentation, enabling the attribution of actions to meaningful and interpretable behavior segments that capture recurring patterns appearing across multiple trajectories. Our method identifies coherent behavior clusters from state-action sequences and attributes individual actions to these clusters for fine-grained, behavior-centric explanations. Evaluations on four diverse offline RL environments show that our approach discovers meaningful behaviors and outperforms trajectory-level baselines in fidelity, human preference, and cluster coherence. Our code is publicly available.

URL: https://openreview.net/forum?id=JbHtpOIH9l

---

New submissions
===============

Title: Causal Graph Learning via Distributional Invariance of Cause-Effect Relationship

Abstract: This paper introduces a new framework for recovering causal graphs from observational data, leveraging the fact that the distribution of an effect, conditioned on its causes, remains invariant to changes in the prior distribution of those causes. This insight enables a direct test for potential causal relationships by checking the variance of their corresponding effect-cause conditional distributions across multiple downsampled subsets of the data. These subsets are selected to reflect different prior cause distributions while preserving the effect-cause conditional relationships. Using this invariance test and exploiting an (empirical) sparsity of most causal graphs, we develop an algorithm that efficiently uncovers causal relationships with quadratic complexity in the number of observational variables, reducing the processing time by up to $25\times$ compared to state-of-the-art methods. Our empirical experiments on a varied benchmark of large-scale datasets show superior or equivalent performance compared to existing works, while achieving enhanced scalability. Our code is openly accessible at https://anonymous.4open.science/r/GLIDE-DC57

URL: https://openreview.net/forum?id=Ey38Q882Xe

---

Title: A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection

Abstract: Bayesian nonparametric methods are naturally suited to the problem of out-of-distribution (OOD) detection. However, these techniques have largely been eschewed in favor of simpler methods based on distances between pre-trained or learned embeddings of data points. Here we show a formal relationship between Bayesian nonparametric models and the relative Mahalanobis distance score (RMDS), a commonly used method for OOD detection. Building on this connection, we propose Bayesian nonparametric mixture models with hierarchical priors that generalize the RMDS. We evaluate these models on the OpenOOD detection benchmark and show that Bayesian nonparametric methods can improve upon existing OOD methods, especially in regimes where training classes differ in their covariance structure and where there are relatively few data points per class.

URL: https://openreview.net/forum?id=w3bMXPMDW1

---

Title: Eyes on the Road, Words in the Changing Skies: Vision-Language Assistance for Autonomous Driving in Transitional Weather

Abstract: The rapid advancement of autonomous vehicle technology (AVT) necessitates robust scene perception and interactive decision-making, particularly under adverse weather conditions. While significant progress has been made in extreme weather scenarios like cloudy, foggy, rainy, and snowy, a critical challenge remains in transitional weather conditions, such as the shift from cloudy to rainy, foggy to sunny, etc. These dynamic environmental changes degrade the performance of conventional vision-language systems by causing unpredictable illumination changes and partial occlusions, which are inadequately represented in current AVT datasets. This lack of continuous, transitional training data compromises model robustness and ultimately affects safety and reliability. On the other hand, Vision-language Models (VLMs) enable interpretable reasoning in autonomous driving through tasks such as image captioning and visual question answering. However, current VLMs, designed for clear weather, perform poorly in transitional conditions and rely on computationally expensive LLMs. This leads to high memory usage and slow inference, which is unsuitable for real-time decision making in AVT. To address these limitations, we propose Vision-language Assistance for Autonomous Driving under Transitional Weather (VLAAD-TW), a lightweight framework with a novel cross-modal spatiotemporal reasoning architecture that robustly interprets and acts on multimodal data. The VLAAD-TW framework integrates a Feature Encoder for Transitional Weather (FETW), a lightweight backbone for robust visual feature extraction, with a Spatiotemporal Contextual Aggregator (SCA), which models dynamic weather-induced changes. It uses a Selective Attention-guided Fusion Module (SAFM) to balance visual and linguistic cues for a unified representation dynamically. Finally, a Semantic Text Generator (STG) fuses these representations to produce context-aware driving information, adapting in real time to both current and predicted weather states. Further, we introduce AIWD16-text dataset, an adverse intermediate weather driving dataset for vision language tasks, which features sixteen transitional weather states created using a Stochastic Conditional Variational Autoencoder (SC-VAE) and enriched with manual annotations of image captions and open-ended question-answer pairs. An extensive evaluation of the AIWD16-text and DriveLM datasets demonstrates VLAAD-TW's high performance in BLEU and ROUGE scores, with low memory and computational requirements, confirming its effectiveness in challenging weather conditions.

URL: https://openreview.net/forum?id=PCEDvdVJon

---

Title: Scaling Gaussian Process Regression with Full Derivative Observations

Abstract: We present a scalable Gaussian Process (GP) method that can fit and predict full derivative observations called DSoftKI. It extends SoftKI, a method that approximates a kernel via softmax interpolation from learned interpolation point locations, to the setting with derivatives. DSoftKI enhances SoftKI’s interpolation scheme to incorporate the directional orientation of interpolation points relative to the data. This enables the construction of a scalable approximate kernel, including its first and second-order derivatives, through interpolation. We evaluate DSoftKI on a synthetic function benchmark and high-dimensional molecular force field prediction (100-1000 dimensions), demonstrating that DSoftKI is accurate and can scale to larger datasets with full derivative observations than previously possible.

URL: https://openreview.net/forum?id=fbonXp38r9

---

Title: Offline Model-Based Optimization: Comprehensive Review

Abstract: Offline black-box optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. This setting is particularly relevant when querying the objective function is prohibitively expensive or infeasible, with applications spanning protein engineering, material discovery, neural architecture search, and beyond. The main difficulty lies in accurately estimating the objective landscape beyond the available data, where extrapolations are fraught with significant epistemic uncertainty. This uncertainty can lead to objective hacking (reward hacking)—exploiting model inaccuracies in unseen regions—or other spurious optimizations that yield misleadingly high performance estimates outside the training distribution. Recent advances in model-based optimization (MBO) have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Trained with carefully designed strategies, these models are more robust against out-of-distribution issues, facilitating the discovery of improved designs. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review. To bridge this gap, we present the first thorough review of offline MBO. We begin by formalizing the problem for both single-objective and multi-objective settings and by reviewing recent benchmarks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs. Finally, we examine the key challenges and propose promising directions for advancement in this rapidly evolving field.

URL: https://openreview.net/forum?id=QcSZWo1TLl

---

Title: Quantum entanglement for attention models

Abstract: Attention mechanisms in deep learning establish relationships between different positions within a sequence, enabling models like Transformers to generate effective outputs by focusing on relevant input segments and their relations. The performance of Transformers is highly dependent on the chosen attention mechanism, with various approaches balancing trade-offs between computational cost, memory efficiency, and generalization ability based on the task.
Quantum machine learning models possess the potential to outperform their classical counterparts in specialized settings. This makes exploring the benefits of quantum resources within classical machine learning models a promising research direction. The role of entanglement in quantum machine learning, whether in fully quantum or as subroutines in classical-quantum hybrid models, remains poorly understood. In this work, we investigate the hypothesis of whether entanglement can be used to model nuanced correlations in classical data, analogous to its role in many-body systems. We further test whether quantum entanglement can be used as a resource to improve the performance of the attention layer in Transformers.
We introduce an entanglement entropy-based attention layer within a classical Transformer architecture and numerically evaluate it across various datasets. Our experiments on standard classification tasks in both vision and NLP domains reveal that the entanglement-based attention layer outperforms existing quantum attention frameworks and the widely used quantum kernel attention models, particularly in the presence of noise. Our work contributes toward exploring the power of quantum resources as a subroutine in the classical-quantum hybrid setting to further enhance classical models.

URL: https://openreview.net/forum?id=G4an9paVtP

---

Title: Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

Abstract: Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we outline a versatile framework for zero-resource hallucination detection that practitioners can apply to real- world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transform- ing them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we propose a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper’s companion Python toolkit. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs.

URL: https://openreview.net/forum?id=WOFspd4lq5

---

Title: RoboRAN: A Unified Robotics Framework for Reinforcement Learning-Based Autonomous Navigation

Abstract: Autonomous robots must navigate and operate in diverse environments, from terrestrial and aquatic settings to aerial and space domains. While Reinforcement Learning (RL) has shown promise in training policies for specific autonomous robots, existing frameworks and benchmarks are often constrained to unique platforms, limiting generalization and fair comparisons across different mobility systems. In this paper, we present a multi-domain framework for training, evaluating and deploying RL-based navigation policies across diverse robotic platforms and operational environments. Our work presents four key contributions: (1) a scalable and modular framework, facilitating seamless robot-task interchangeability and reproducible training pipelines; (2) sim-to-real transfer demonstrated through real- world experiments with multiple robots, including a satellite robotic simulator, an unmanned surface vessel, and a wheeled ground vehicle; (3) the release of the first open-source API for deploying IsaacLab-trained policies to real robots, enabling lightweight inference and rapid field validation; and (4) uniform tasks and metrics for cross-medium evaluation, through a unified evaluation testbed to assess performance of navigation tasks in diverse operational conditions (aquatic, terrestrial and space). By ensuring consistency between simulation and real-world deployment, RoboRAN lowers the barrier to developing adaptable RL-based navigation strategies. Its modular design enables straightforward integration of new robots and tasks through predefined templates, fostering reproducibility and extension to diverse domains. To support the community, we release RoboRAN as open-source.

URL: https://openreview.net/forum?id=0wDbhLeMj9

---

Title: One Model for All: Multi-Objective Controllable Language Models

Abstract: Aligning large language models (LLMs) with human preferences is critical to enhancing LLMs' safety, helpfulness, humor, faithfulness, and other desirable properties. Current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned from average human ratings, which may weaken the adaptability and controllability of varying preferences. However, creating personalized LLMs requires aligning LLMs with individual human preferences, which is non-trivial due to the scarce data per user and the diversity of user preferences in multi-objective trade-offs, such as prioritizing humor and empathy in one context, while seeking efficiency and precision in another. Can we train one LLM to produce personalized outputs for different user preferences on the Pareto front? In this paper, we introduce Multi-Objective Control (MOC), which trains a single LLM to directly generate responses in the preference-defined regions of the Pareto front. Our approach introduces multi-objective optimization (MOO) principles into RLHF to train an LLM as a preference-conditioned policy network. We improve the computational efficiency of MOC by applying MOO at the policy level, enabling us to fine-tune a 7B-parameter model on a single A6000 GPU. Extensive experiments demonstrate the advantages of MOC over baselines in three aspects: (i) controllability of LLM outputs with respect to user preferences on the trade-off among multiple rewards; (ii) quality and diversity of LLM outputs, measured by the hyper-volume of multiple solutions achieved; and (iii) generalization to unseen preferences. These results highlight MOC’s potential for real-world applications requiring scalable and customizable LLMs.

URL: https://openreview.net/forum?id=qAM5PmvFYY

---

Title: HyResPINNs: A Hybrid Residual Physics-Informed Neural Network Architecture Designed to Balance Expressiveness and Trainability

Abstract: Physics-informed neural networks (PINNs) have emerged as a powerful approach for solving partial differential equations (PDEs) by training neural networks with loss functions that incorporate physical constraints. In this work, we introduce HyResPINNs, a two-level convex-gated architecture designed to maximize approximation expressiveness for a fixed number of degrees of freedom (DoF). The first level involves a trainable, per-block combination of smooth basis functions with trainable sparsity, and deep neural networks; the second involves the ability to gate entire blocks (much like in ResNets or Highway Nets), allowing for expressivity along the depth dimension of the architecture. Our empirical evaluation on a diverse set of challenging PDE problems demonstrates that HyResPINNs consistently achieve superior accuracy to baseline methods while remaining competitive relative to training times. These results highlight the potential of HyResPINNs to combine desirable features from traditional scientific computing methods and modern machine learning, paving the way for more robust and expressive approaches to physics-informed modeling.

URL: https://openreview.net/forum?id=et9WkjkqAw

---

Title: Positional Encoder Graph Quantile Neural Networks for Geographic Data

Abstract: Positional Encoder Graph Neural Networks (PE-GNNs) are among the most effective models for learning from continuous spatial data. However, their predictive distributions are often poorly calibrated, limiting their utility in applications that require reliable uncertainty quantification. We propose the Positional Encoder Graph Quantile Neural Network (PE-GQNN), a novel framework that combines PE-GNNs with Quantile Neural Networks, partially monotonic neural blocks, and post-hoc recalibration techniques. The PE-GQNN enables flexible and robust conditional density estimation with minimal assumptions about the target distribution, and it extends naturally to tasks beyond spatial data. Empirical results on benchmark datasets show that the PE-GQNN outperforms existing methods in both predictive accuracy and uncertainty quantification, without incurring additional computational cost. We also identify important special cases arising from our formulation, including the PE-GNN.

URL: https://openreview.net/forum?id=5PjL8ZOPBt

---

Title: Guiding Reasoning in Small Language Models with LLM Assistance

Abstract: Small language models (SLMs) typically falter on tasks requiring deep, multi-step reasoning. This paper introduces SMART ( Small Reasons, Large Hints), a framework where large language models (LLMs) provide targeted, selective guidance to augment SLM reasoning. Drawing from cognitive scaffolding, SMART uses a score-based mechanism to identify uncertain SLM reasoning steps, triggering LLM correction only when essential. This approach, framing structured reasoning as an optimal policy search, steers SLMs towards correct solutions without exhaustive sampling. On mathematical reasoning datasets, SMART enables SLMs to achieve up to 98.9% of LLM-level performance while reducing LLM token usage by up to 90.0%. Our work paves the way for collaborative use of both SLM and LLM to tackle complex reasoning tasks that are currently unsolvable by SLMs alone.

URL: https://openreview.net/forum?id=Q2d7tSWFdQ

---

Title: Learning Task-Aware Abstract Representations for Meta-Reinforcement Learning

Abstract: A central challenge in meta-reinforcement learning (meta-RL) is enabling agents trained on a set of environments to generalize to new, related tasks without requiring full policy retraining. Existing model-free approaches often rely on context-conditioned policies learned via encoder networks. However, these context encoders are prone to overfitting on the training environments, resulting in poor out-of-sample performance on unseen tasks. To address this issue, we adopt an alternative approach that uses an abstract representation model to learn augmented, task-aware abstract states. We achieve this by introducing a novel architecture that offers more flexibility than existing recurrent network-based approaches. In addition, we optimize our model with multiple loss terms that encourage predictive, task-aware representations in the abstract state space. Our method simplifies the learning problem and provides a flexible framework that can be easily combined with any off-the-shelf reinforcement learning algorithm. We provide theoretical guarantees alongside empirical results, showing strong generalization performance across classical control and robotic meta-RL benchmarks.

URL: https://openreview.net/forum?id=3CWyTh4hJ4

---

Title: BAOSL: Benchmarking Active Optimization for Self-driving Laboratories

Abstract: Discovery of novel materials and antibiotics can be posed as an optimization problem, namely, identifying candidate formulations that maximize one or more desired properties. In practice, however, the enormous dimensionality of the design space and the high cost of each experimental evaluation make exhaustive search strategies infeasible. Active learning methods, which iteratively identify informative data points, offer a promising solution to tackle these challenges by significantly reducing the data-labeling effort and resource requirements. Integrating active learning into optimization workflows, hereafter termed active optimization, accelerates the discovery of optimal candidates while substantially cutting the number of required evaluations. Despite these advances, the absence of standardized benchmarks impedes objective comparison of methodologies, slowing progress in self-driving scientific discovery. To address this, we introduce BAOSA, a comprehensive benchmark designed to systematically evaluate active optimization in self-driving laboratories. BAOSA provides a standardized evaluation protocol and reference implementations to facilitate efficient and reproducible benchmarking. BAOSA includes both synthetic benchmarks and real-world tasks in various fields, designed to address unique challenges, particularly limited data availability, in self-driving laboratories.

URL: https://openreview.net/forum?id=UYEHUOSPmU

---

Reply all

Reply to author

Forward

0 new messages