Accepted papers
===============
Title: RIZE: Adaptive Regularization for Imitation Learning
Authors: Adib Karimi, Mohammad Mehdi Ebadzadeh
Abstract: We propose a novel Inverse Reinforcement Learning (IRL) method that mitigates the rigidity of fixed reward structures and the limited flexibility of implicit reward regularization. Building on the Maximum Entropy IRL framework, our approach incorporates a squared temporal-difference (TD) regularizer with adaptive targets that evolve dynamically during training, thereby imposing adaptive bounds on recovered rewards and promoting robust decision-making. To capture richer return information, we integrate distributional RL into the learning process. Empirically, our method achieves expert-level performance on complex MuJoCo and Adroit environments, surpassing baseline methods on the Humanoid-v2 task with limited expert demonstrations. Extensive experiments and ablation studies further validate the effectiveness of the approach and provide insights into reward dynamics in imitation learning. Our source code is available at https://github.com/adibka/RIZE.
URL: https://openreview.net/forum?id=a6DWqXJZCZ
---
Title: Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design
Authors: Jiannan Yang, Veronika Thost, Tengfei Ma
Abstract: Self-supervised learning (SSL) plays a central role in molecular representation learning. Yet, many recent innovations in masking-based pretraining are introduced as heuristics and lack principled evaluation, obscuring which design choices are genuinely effective. This work cast the entire pretrain–finetune workflow into a unified probabilistic framework, enabling a transparent comparison and deeper understanding of masking strategies. Building on this formalism, we conduct a controlled study of three core design dimensions: masking distribution, prediction target, and encoder architecture, under rigorously controlled settings. We further employ information-theoretic measures to assess the informativeness of pretraining signals and connect them to empirically benchmarked downstream performance. Our findings reveal a surprising insight: sophisticated masking distributions offer no consistent benefit over uniform sampling for common node-level prediction tasks. Instead, the choice of prediction target and its synergy with the encoder architecture are far more critical. Specifically, shifting to semantically richer targets yields substantial downstream improvements, particularly when paired with expressive Graph Transformer encoders. These insights offer practical guidance for developing more effective SSL methods for molecular graphs.
URL: https://openreview.net/forum?id=TE4vcYWRcc
---
New submissions
===============
Title: Robust Object Detection with Pseudo Labels from VLMs using Per-Object Co-teaching
Abstract: Foundation models, especially vision-language models (VLMs), offer compelling zero-shot object detection for applications like autonomous driving, a domain where manual labelling is prohibitively expensive. However, their detection latency and tendency to hallucinate predictions render them unsuitable for direct deployment. This work introduces a novel pipeline that addresses this challenge by leveraging VLMs to automatically generate pseudo-labels for training efficient, real-time object detectors. Our key innovation is a per-object co-teaching-based training strategy that mitigates the inherent noise in VLM-generated labels. The proposed per-object coteaching approach filters noisy bounding boxes from training instead of filtering the entire image. Specifically, two YOLO models learn collaboratively, filtering out unreliable boxes from each mini-batch based on their peers' per-object loss values. Overall, our pipeline provides an efficient, robust, and scalable approach to train high-performance object detectors for autonomous driving, significantly reducing reliance on costly human annotation. Experimental results on the KITTI dataset demonstrate that our method outperforms a baseline YOLOv5m model, achieving a significant mAP@0.5 boost ($31.12\%$ to $46.61\%$) while maintaining real-time detection latency. Furthermore, we show that supplementing our pseudo-labelled data with a small fraction of ground truth labels ($10\%$) leads to further performance gains, reaching $57.97\%$ mAP@0.5 on the KITTI dataset. We observe similar performance improvements for the ACDC and BDD100k datasets.
URL: https://openreview.net/forum?id=WqiUgx90nO
---
Title: MetaSym: A Symplectic Meta-learning Framework for Physical Intelligence
Abstract: Scalable and generalizable physics-aware deep learning has long been considered a significant challenge with various applications across diverse domains ranging from robotics to molecular dynamics. Central to almost all physical systems are symplectic forms, the geometric backbone that underpins fundamental invariants like energy and momentum. In this work, we introduce a novel deep learning framework, MetaSym. In particular, MetaSym combines a strong symplectic inductive bias obtained from a symplectic encoder, and an autoregressive decoder with meta-attention. This principled design ensures that core physical invariants remain intact, while allowing flexible, data efficient adaptation to system heterogeneities. We benchmark MetaSym with highly varied and realistic datasets, such as a high-dimensional spring-mesh system Otness et al. (2021), an open quantum system with dissipation and measurement backaction, and robotics-inspired quadrotor dynamics. Crucially, we fine-tune and deploy MetaSym on real-world quadrotor data, demonstrating robustness to sensor noise and real-world uncertainty. Across all tasks, MetaSym achieves superior few-shot adaptation and outperforms larger State-of-The-Art (SoTA) models.
URL: https://openreview.net/forum?id=MV1wfMe647
---
Title: Enhancing Deep Consistent Graph Metric with Affinity and Alignment for Incremental Social Event Detection using Cross-Layer Attention
Abstract: Existing methods of event detection from social media (i.e., X), for instance, KPGNN, FinEvent, and CLKD, use triplet loss for feature separation. Triplet loss suffers from two notable discrepancies in the latent space: (i) inconsistency in intra-event and inter-event distances, and (ii) an inability to ensure the closeness of messages from the same event across different mini-batches. The present paper proposes two novel loss functions to improve consistency in the latent space. The first loss function guarantees consistent intra-event and inter-event distances by increasing the affinity between intra-event points. On the other hand, the alignment loss enhances the cosine similarity between the feature space and label space, thereby aligning features of the same event class across diverse mini-batches. We provide theoretical justification that the proposed loss ensures discriminative features in the latent space, like CGML, without its costly pairwise or specialised batching. Adding to our loss function, we introduce a new attention module designed to effectively address heterogeneous relations without necessitating a separate optimisation objective. Through comprehensive experimentation on two publicly available datasets, we have shown an average improvement of $26.59\%$, $30.49\%$ and $142.38\%$ in NMI, AMI and ARI, respectively, over supervised SOTA event detection methods. Our method also shows improvements over SOTA unsupervised event detection methods across both datasets. These are supported by statistical significance tests.
URL: https://openreview.net/forum?id=vNJ7mCgDbq
---
Title: When Does Causal Regularization Help? A Systematic Study of Boundary Conditions in Spurious Correlation Learning
Abstract: We challenge the conventional wisdom that explicit causal regularization is necessary for out-of-distribution generalization. Through systematic investigation on ColoredMNIST, we discover that reconstructive architectures like autoencoders provide a powerful implicit causal bias that largely obviates the need for explicit methods like IRM or HSIC. Autoen-coder baselines achieve 82-86% accuracy with 99% spurious correlation, with explicit causal losses adding only marginal (0-4pp) gains.
Using the Atlasing Pattern Space (APS) framework—a modular toolkit combining topology preservation (T), causal invariance (C), and energy shaping (E)—we establish clear bound-ary conditions for when explicit regularization helps. Our experiments across multiple do-mains reveal that: (1) explicit causal methods become critical only when architectural bias is absent or spurious correlations are pathologically strong; (2) topology preservation im-proves kNN fidelity in high-dimensional vision tasks but fails completely in low-dimensional synthetic settings; and (3) energy-based regularization effectively prevents overfitting while maintaining OOD accuracy.
Through controlled experiments including a systematic study of component domain-specificity, we demonstrate that regularization components are not universally beneficial but rather require careful domain-specific validation. Our results reframe causal learning as a hierarchical process: architectural choice is primary, with explicit regularizers serving as targeted, domain-specific corrections when architectural bias proves insufficient.
URL: https://openreview.net/forum?id=IiIhq5JeDJ
---
Title: Hierarchical Multi-Level 3D Geometry Generation with Stress-Aware Learning
Abstract: Current approaches for Lego 3d structural assembly are usually learned to maximize IOU between generated output and target construction. We propose a new approach which is able to build stable structures based on physics-aware reward. Our method employs a two-level agent architecture in which a high-level PPO-based planner proposes a scheme, while a low-level Wave Function Collapse (WFC) agent handles precise brick placement with constraint satisfaction. Experimental results demonstrate that our hierarchical method consistently constructs structurally sound buildings while reducing material usage. We also show that replacing the computationally expensive FEM solver with fast FNO achieves comparable performance, confirming the approach's scalability for large-scale problems.
URL: https://openreview.net/forum?id=kyoXKiyoA3
---
Title: α-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction
Abstract: Comprehending 3D scenes is paramount for tasks such as planning and mapping for autonomous vehicles and robotics. Camera-based 3D Semantic Occupancy Prediction (OCC) aims to infer scene geometry and semantics from limited observations. While it has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address this, we propose an uncertainty-aware OCC method (α-OCC). We first introduce Depth-UP, an uncertainty propagation framework that improves geometry completion by up to 11.58% and semantic segmentation by up to 12.95% across various OCC models. For uncertainty quantification (UQ), we propose the hierarchical conformal prediction (HCP) method, effectively handling the high-level class imbalance in OCC datasets. On the geometry level, the novel KL-based score function significantly improves the occupied recall (45%) of safety-critical classes with minimal performance overhead (3.4% reduction). On UQ, our HCP achieves smaller prediction set sizes while maintaining the defined coverage guarantee. Compared with baselines, it reduces up to 90% set size, with 18% further reduction when integrated with Depth-UP. Our contributions advance OCC accuracy and robustness, marking a noteworthy step forward in autonomous perception systems.
URL: https://openreview.net/forum?id=bUv25gBLlV
---