Accepted papers
===============
Title: Watermarking Degrades Alignment in Language Models: Analysis and Mitigation
Authors: Apurv Verma, Hai Phan, Shubhendu Trivedi
Abstract: Watermarking has become a practical tool for tracing language model outputs, but it modifies token probabilities at inference time, which were carefully tuned by alignment training. This creates a tension: how do watermark-induced shifts interact with the procedures intended to make models safe and useful? Experiments on several contemporary models and two representative watermarking schemes reveal that watermarking induces a nontrivial, patterned yet model-specific shift in alignment. We see two failure modes: guard attenuation, where models become more helpful but less safe, and guard amplification, where refusals become overly conservative. These effects persist even after controlling for perplexity degradation, pointing to alignment-specific distortions, not just quality loss. We address this with Alignment Resampling (AR), a procedure that samples multiple watermarked outputs and selects the most aligned response according to an external reward model. Using standard results on the expected maximum of Gaussian random variables, we derive a theoretical lower bound showing that alignment gains grow sublogarithmically with sample size. In practice, sampling as few as two to four candidates largely restores unwatermarked alignment performance in truthfulness, safety, and helpfulness, without hurting watermark detection. This is the first empirical study of watermarking-alignment interactions; it shows that a simple inference-time fix can recover alignment.
URL: https://openreview.net/forum?id=w2ATKQcfWx
---
Title: Policy Learning with a Language Bottleneck
Authors: Megha Srivastava, Cédric Colas, Dorsa Sadigh, Jacob Andreas
Abstract: Modern AI systems such as self-driving cars and game-playing agents achieve superhuman
performance. But they often lack human-like generalization, interpretability, and inter-
operability with human users. This paper introduces *Policy Learning with a Language
Bottleneck* (PLLB), a framework enabling AI agents to generate linguistic rules that capture
the high-level strategies underlying rewarding behaviors. PLLB alternates between a *rule
generation* step guided by language models, and an *update* step where agents learn new
policies guided by rules. Crucially, PLLB enables this kind of language-guided learning
even when a natural language rule is insufficient to completely describe the target policy.
Across five diverse tasks, including a two-player signaling game, maze navigation, image
reconstruction, and robot grasp planning, we show that PLLB learns more interpretable
and generalizable behaviors than standard policy learning methods. In three additional
human subject studies, we show that show the learned rules significantly improve human
task performance, enabling more effective human-AI coordination
URL: https://openreview.net/forum?id=sK8uEqzQPv
---
Title: Denoising Hamiltonian Network for Physical Reasoning
Authors: Congyue Deng, Brandon Y. Feng, Cecilia Garraffo, Alan Garbarz, Robin Walters, William T. Freeman, Leonidas Guibas, Kaiming He
Abstract: Machine learning frameworks for physical problems must capture and enforce physical constraints that preserve the structure of dynamical systems. Many existing approaches achieve this by integrating physical operators into neural networks. While these methods offer theoretical guarantees, they face two key limitations: (i) they primarily model local relations between adjacent time steps, overlooking longer-range or higher-level physical interactions, and (ii) they focus on forward simulation while neglecting broader physical reasoning tasks. We propose the Denoising Hamiltonian Network (DHN), a novel framework that generalizes Hamiltonian mechanics operators into more flexible neural operators. DHN captures non-local temporal relationships and mitigates numerical integration errors through a denoising mechanism. DHN also supports multi-system modeling with a global conditioning mechanism. We demonstrate its effectiveness and flexibility across three diverse physical reasoning tasks with distinct inputs and outputs.
URL: https://openreview.net/forum?id=KublEgx7Hv
---
Title: Amortized Bayesian Workflow
Authors: Chengkun LI, Aki Vehtari, Paul-Christian Bürkner, Stefan T. Radev, Luigi Acerbi, Marvin Schmitt
Abstract: Bayesian inference often faces a trade-off between computational speed and sampling accuracy. We propose an adaptive workflow that integrates rapid amortized inference with gold-standard MCMC techniques to achieve a favorable combination of both speed and accuracy when performing inference on many observed datasets. Our approach uses principled diagnostics to guide the choice of inference method for each dataset, moving along the Pareto front from fast amortized sampling via generative neural networks to slower but guaranteed-accurate MCMC when needed. By reusing computations across steps, our workflow synergizes amortized and MCMC-based inference. We demonstrate the effectiveness of this integrated approach on several synthetic and real-world problems with tens of thousands of datasets, showing efficiency gains while maintaining high posterior quality.
URL: https://openreview.net/forum?id=osV7adJlKD
---
Title: Layer Collapse Can be Induced by Unstructured Pruning
Authors: Zhu LIAO, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione
Abstract: Unstructured pruning is a popular compression method for efficiently reducing model parameters. However, while it effectively decreases the number of parameters, it is commonly believed that unstructured pruning cannot shorten the computational critical path, i.e., the maximum number of layers traversed during forward propagation.
In this paper, we study when and how unstructured pruning can yield structural effects. For rectifier-activated networks, we introduce the notion of neuron entropy, which quantifies the degree of nonlinearity utilization. We show that magnitude-based pruning naturally lowers this entropy, sometimes down to zero-entropy layers that become linearizable and can thus be removed. Building on this insight, we propose a method that leverages "unstructured" pruning to favor sparsity in low-entropy layers, enabling their complete removal. We validate the phenomenon across CNNs, Vision Transformers, and NLP models: unstructured pruning can induce effective layer removal with little or no performance degradation in over-parameterized networks. Our code is available at https://github.com/ZhuLIAO001/NEPENTHE.git.
URL: https://openreview.net/forum?id=rfDYZNZIZT
---
Title: The Cost of Replicability in Active Learning
Authors: Rupkatha Hira, Dominik Kau, Jessica Sorrell
Abstract: Active learning aims to reduce the number of labeled data points required by machine learning algorithms by selectively querying labels from initially unlabeled data. Ensuring replicability, where an algorithm produces consistent outcomes across different runs, is essential for the reliability of machine learning models but often increases sample complexity. This report investigates the cost of replicability in active learning using two classical disagreement-based methods: the CAL and A\textsuperscript{2} algorithms. Leveraging random thresholding techniques, we propose two replicable active learning algorithms: one for realizable learning of finite hypothesis classes, and another for agnostic. Our theoretical analysis shows that while enforcing replicability increases label complexity, CAL and A\textsuperscript{2} still achieve substantial label savings under this constraint. These findings provide key insights into balancing efficiency and stability in active learning.
URL: https://openreview.net/forum?id=ZsqJu9eITd
---
New submissions
===============
Title: DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks
Abstract: Meta-learning represents a strong class of approaches for solving few-shot learning tasks. Nonetheless, recent research suggests that simply pre-training a generic encoder can potentially surpass meta-learning algorithms. In this paper, we hypothesize that the reason meta-learning fails to stand out in popular few-shot learning benchmarks is the lack of diversity among the few-shot learning tasks. We propose DRESS, a task-agnostic Disentangled REpresentation-based Self-Supervised meta-learning approach that enables fast model adaptation on highly diversified few-shot learning tasks. Specifically, DRESS utilizes disentangled representation learning to create self-supervised tasks that can fuel the meta-training process. We validate the effectiveness of DRESS through experiments on datasets with multiple factors of variation and varying complexity. The results suggest that DRESS is able to outperform competing methods on the majority of the datasets and task setups. Through this paper, we advocate for a re-examination of how task adaptation studies are conducted, and aim to reignite interest in the potential of meta-learning for solving few-shot learning tasks via disentangled representations.
URL: https://openreview.net/forum?id=TSjDJYKLmu
---
Title: Inducing Disagreement in Multi-Agent LLM Executive Teams: Only the Devil’s Advocate Works
Abstract: Multi-agent large language model (LLM) systems for strategic decision-making suffer from premature convergence, limiting the benefits of multiple perspectives. While several techniques for inducing disagreement have been proposed, no systematic comparison exists—particularly for strategic decisions without objectively correct answers. We compare five prompting techniques across 20 business scenarios with four-agent executive teams (CEO, CFO, CMO, COO), analyzing 480 team decisions and $1{,}920$ individual agent responses. Our key finding is stark: Devil's Advocate assignment achieves $99.2%$ disagreement rates, while baseline conditions show only $48.3%$ disagreement. Critically, "soft" techniques—Strong Role Framing ($61.7%$), Explicit Dissent Instructions ($55.0%$), and their combination ($63.3%$)—are statistically indistinguishable from baseline. Only Devil's Advocate produces significant improvement. We also discover consistent coalition patterns: $80.3%$ of 2-2 splits follow a CEO+CMO versus CFO+COO alignment, suggesting functional perspective differentiation. Analysis of confidence allocations reveals that soft techniques create "nuanced agreement"—agents express lower conviction but reach the same conclusions—while Devil's Advocate produces "inauthentic dissent" where $4.9%$ of agents recommend options they privately rate lower. These findings demonstrate that explicit behavioral assignment ("you must oppose") succeeds where implicit instructions ("think critically") fail, with implications for practitioners designing multi-agent deliberation systems.
URL: https://openreview.net/forum?id=mxBmj5LYU2
---
Title: Primus: Enforcing Attention Usage for 3D Medical Image Segmentation
Abstract: Transformers have achieved remarkable success across multiple fields, yet their impact on 3D medical image segmentation remains limited with convolutional networks still dominating major benchmarks. In this work, (A) we analyze current Transformer-based segmentation models and identify critical shortcomings, particularly their over-reliance on convolutional blocks. Further, we demonstrate that in some architectures, performance is unaffected by the absence of the Transformer, thereby demonstrating their limited effectiveness. To address these challenges, we move away from hybrid architectures and (B) introduce Transformer-centric segmentation architectures, termed Primus and PrimusV2. Primus leverages high-resolution tokens, combined with advances in positional embeddings and block design, to maximally leverage its Transformer blocks, while PrimusV2 expands on this through an iterative patch embedding. Through these adaptations, Primus surpasses current Transformer-based methods and competes with a default nnU-Net while PrimusV2 exceeds it and is on par with the state-of-the-art CNNs such as ResEnc-L and MedNeXt architectures across nine public datasets. In doing so, we introduce the first competitive Transformer-centric model, making Transformers state-of-the-art in 3D medical segmentation. Our code will be published.
URL: https://openreview.net/forum?id=x4vZE4PDEu
---
Title: Beyond the Linear Separability Ceiling: Aligning Representations in VLMs
Abstract: A challenge in advancing Visual-Language Models (VLMs) is determining whether their failures on abstract reasoning tasks, such as Bongard problems, stem from flawed perception or faulty top-down reasoning. To disentangle these factors, we introduce a diagnostic framework centered on the Linear Separability Ceiling (LSC), the performance achievable by a linear classifier on a VLM's raw visual embeddings. Applying this framework to state-of-the-art VLMs, we uncover a pervasive ''alignment gap'', where most models fail to generatively outperform the linear separability of their representations. We find that the few models surpassing this ceiling do so via two mechanisms: by further refining visual representations into a more linearly separable format or by executing non-linear decision logic. We demonstrate that this bottleneck is not a fundamental limitation but a solvable visual alignment issue. Our method augments standard next-token prediction with a contrastive objective to restructure the visual manifold into a more one-dimensionally linear geometry, improving image-to-image comparison and enabling models to significantly surpass the LSC on abstract binary classification tasks.
URL: https://openreview.net/forum?id=3uX4p80bN0
---
Title: MMCOMPOSITION: Revisiting the Compositionality of Pre- trained Vision-Language Models
Abstract: The advent of large Vision-Language Models (VLMs) has significantly advanced multimodal under- standing, enabling more sophisticated and accurate integration of visual and textual information across various tasks, including image and video captioning, visual question answering, and cross-modal retrieval. Despite VLMs’ superior capabilities, researchers lack a comprehensive understanding of their compositionality – the ability to understand and produce novel combinations of known visual and textual components. Prior benchmarks provide only a relatively rough compositionality evaluation from the perspectives of objects, relations, and attributes while neglecting deeper reasoning about object interactions, counting, and complex compositions. However, compositionality is a critical ability that facilitates coherent reasoning and understanding across modalities for VLMs. To address this limitation, we propose MMCOMPOSITION, a novel human-annotated benchmark for comprehensively and accurately evaluating VLMs’ compositionality. With MMCOMPOSITION, we can quantify and explore the compositionality of the mainstream VLMs. Surprisingly, we find GPT-4o’s compositionality inferior to the best open-source model, and we analyze the underlying reasons. Our experimental analysis reveals the limitations of VLMs in fine-grained compositional perception and reasoning, and points to areas for improvement in VLM design and training.
URL: https://openreview.net/forum?id=aWO15tpSH8
---