J2C Certification: FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning
Dan Kalifa, Uriel Singer, Kira Radinsky
https://openreview.net/forum?id=imcinaOHod
---
Accepted papers
===============
Title: CREW-Wildfire: Benchmarking Agentic Multi-Agent Collaborations at Scale
Authors: Jonathan Hyun, Nicholas R Waytowich, Boyuan Chen
Abstract: Despite rapid progress in large language model (LLM)-based multi-agent systems, current benchmarks fall short in evaluating their scalability, robustness, and coordination capabilities in complex, dynamic, real-world tasks. Existing environments typically focus on small-scale, fully observable, or low-complexity domains, limiting their utility for developing and assessing next-generation multi-agent Agentic AI frameworks. We introduce CREW-Wildfire, an open-source benchmark designed to close this gap. Built atop the human-AI teaming CREW simulation platform, CREW-Wildfire offers procedurally generated wildfire response scenarios featuring large maps, heterogeneous agents, partial observability, stochastic dynamics, and long-horizon planning objectives. The environment supports both low-level control and high-level natural language interactions through modular Perception and Execution modules. We implement and evaluate several state-of-the-art LLM-based multi-agent Agentic AI frameworks, uncovering significant performance gaps that highlight the unsolved challenges in large-scale coordination, communication, spatial reasoning, and long-horizon planning under uncertainty. By providing more realistic complexity, scalable architecture, and behavioral evaluation metrics, CREW-Wildfire establishes a critical foundation for advancing research in scalable multi-agent Agentic intelligence. All code, environments, data, and baselines will be released to support future research in this emerging domain.
URL: https://openreview.net/forum?id=8mr27qFzKR
---
Title: Gaussian mixture layers for neural networks
Authors: Sinho Chewi, Philippe Rigollet, Yuling Yan
Abstract: The mean-field theory for two-layer neural networks considers infinitely wide networks that are linearly parameterized by a probability measure over the parameter space. This nonparametric perspective has significantly advanced both the theoretical and conceptual understanding of neural networks, with substantial efforts made to validate its applicability to networks of moderate width. In this work, we explore the opposite direction, investigating whether dynamics can be directly implemented over probability measures. Specifically, we employ Gaussian mixture models as a flexible and expressive parametric family of distributions together with the theory of Wasserstein gradient flows to derive training dynamics for such measures. Our approach introduces a new type of layer—the Gaussian mixture (GM) layer—that can be integrated into neural network architectures. As a proof of concept, we validate our proposal through experiments on simple classification tasks, where a GM layer achieves test performance comparable to that of a two-layer fully connected network. Furthermore, we examine the behavior of these dynamics and demonstrate numerically that GM layers exhibit markedly different behavior compared to classical fully connected layers, even when the latter are large enough to be considered in the mean-field regime.
URL: https://openreview.net/forum?id=sAptI2o5cP
---
Title: Superposition as Lossy Compression — Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability
Authors: Leonard Bereska, Zoe Tzifa-Kratira, Reza Samavi, Stratis Gavves
Abstract: Neural networks achieve remarkable performance through superposition: encoding multiple features as overlapping directions in activation space rather than dedicating individual neurons to each feature. This phenomenon challenges interpretability; when neurons respond to multiple unrelated concepts, understanding network behavior becomes difficult. Yet despite its importance, we lack principled methods to measure superposition.
We present an information-theoretic framework measuring a neural representation's effective degrees of freedom. We apply Shannon entropy to sparse autoencoder activations to compute the number of effective features as the minimum number of neurons needed for interference-free encoding. Equivalently, this measures how many "virtual neurons" the network simulates through superposition. When networks encode more effective features than they have actual neurons, they must accept interference as the price of compression.
Our metric strongly correlates with ground truth in toy models, detects minimal superposition in algorithmic tasks (effective features approximately equal neurons), and reveals systematic reduction under dropout. Layer-wise patterns of effective features mirror studies of intrinsic dimensionality on Pythia-70M. The metric also captures developmental dynamics, detecting sharp feature consolidation during the grokking phase transition.
Surprisingly, adversarial training can increase effective features while improving robustness, contradicting the hypothesis that superposition causes vulnerability. Instead, the effect of adversarial training on superposition depends on task complexity and network capacity; simple tasks with ample capacity allow feature expansion (abundance regime), while complex tasks or limited capacity force feature reduction (scarcity regime).
By defining superposition as lossy compression, this work enables principled, practical measurement of how neural networks organize information under computational constraints, in particular, connecting superposition to adversarial robustness.
URL: https://openreview.net/forum?id=qaNP6o5qvJ
---
Title: Generative Proto-Sequence: Sequence-Level Decision Making for Long-Horizon Reinforcement Learning
Authors: Netanel Fried, Liad Giladi, Gilad Katz
Abstract: Deep reinforcement learning (DRL) methods often face challenges in environments characterized by large state spaces, long action horizons, and sparse rewards, where effective exploration and credit assignment are critical. We introduce Generative Proto-Sequence (GPS), a novel generative DRL approach that produces variable-length discrete action sequences. By generating entire action sequences in a single decision rather than selecting individual actions at each timestep, GPS reduces the temporal decision bottleneck that impedes learning in long-horizon tasks. This sequence-level abstraction provides three key advantages: (1) it facilitates more effective credit assignment by directly connecting state observations with the outcomes of complete behavioral patterns; (2) by committing to coherent multi-step strategies, our approach facilitates better exploration of the state space; and (3) it promotes better generalization by learning macro-behaviors that transfer across similar situations rather than memorizing state-specific responses. Evaluations across diverse maze navigation tasks of varying sizes and complexities demonstrate that GPS outperforms leading action repetition and temporal methods in the large majority of tested configurations, where it converges faster and achieves higher success rates.
URL: https://openreview.net/forum?id=fSG2DRHtOg
---
Title: IPA: An Information-Reconstructive Input Projection Framework for Efficient Foundation Model Adaptation
Authors: Yuan Yin, Shashanka Venkataramanan, Tuan-Hung Vu, Andrei Bursuc, Matthieu Cord
Abstract: Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce adaptation cost by injecting low-rank updates into pretrained weights. However, LoRA’s down-projection is randomly initialized and data-agnostic, discarding potentially useful information. Prior analyses show that this projection changes little during training, while the up-projection carries most of the adaptation, making the random input compression a performance bottleneck. We propose IPA, a feature-aware projection framework that explicitly aims to reconstruct the original input within a reduced hidden space. In the linear case, we instantiate IPA with algorithms approximating top principal components, enabling efficient projector pretraining with negligible inference overhead. Across language and vision benchmarks, IPA consistently improves over LoRA and DoRA, achieving on average 1.5 points higher accuracy on commonsense reasoning and 2.3 points on VTAB-1k, while matching full LoRA performance with roughly half the trainable parameters when the projection is frozen. Code available at https://github.com/valeoai/peft-ipa.
URL: https://openreview.net/forum?id=aLmQeZx2pR
---
Title: FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning
Authors: Dan Kalifa, Uriel Singer, Kira Radinsky
Abstract: Accurate protein representations that integrate sequence and three-dimensional (3D) structure are critical to many biological and biomedical tasks. Most existing models either ignore structure or combine it with sequence through a single, static fusion step. Here we present FusionProt, a unified model that learns representations via iterative, bidirectional fusion between a protein language model and a structure encoder. A single learnable token serves as a carrier, alternating between sequence attention and spatial message passing across layers. FusionProt is evaluated on Enzyme Commission (EC), Gene Ontology (GO), and mutation stability prediction tasks. It improves F\textsubscript{max} by a median of $+1.3$ points (up to $+2.0$) across EC and GO benchmarks, and boosts AUROC by $+3.6$ points over the strongest baseline on mutation stability. Inference cost remains practical, with only $\sim2\text{--}5\%$ runtime overhead.
Beyond state-of-the-art performance, we further demonstrate FusionProt’s practical relevance through representative biological case studies, suggesting that the model captures biologically relevant features.
URL: https://openreview.net/forum?id=imcinaOHod
---
Title: FedHERO: A Federated Learning Approach for Node Classification Task on Heterophilic Graphs
Authors: Zihan Chen, Xingbo Fu, Yushun Dong, Jundong Li, Cong Shen
Abstract: Graph neural networks (GNNs) have shown significant success in modeling graph data, and Federated Graph Learning (FGL) empowers clients to collaboratively train GNNs in a distributed manner while preserving data privacy. However, FGL faces unique challenges when the general neighbor distribution pattern of nodes varies significantly across clients. Specifically, FGL methods usually require that the graph data owned by all clients is homophilic to ensure similar neighbor distribution patterns of nodes. Such an assumption ensures that the learned knowledge is consistent across the local models from all clients. Therefore, these local models can be properly aggregated as a global model without undermining the overall performance. Nevertheless, when the neighbor distribution patterns of nodes vary across different clients (e.g., when clients hold graphs with different levels of heterophily), their local models may gain different and even conflict knowledge from their node-level predictive tasks. Consequently, aggregating these local models usually leads to catastrophic performance deterioration on the global model. To address this challenge, we propose FedHERO, an FGL framework designed to harness and share insights from heterophilic graphs effectively. At the heart of FedHERO is a dual-channel GNN equipped with a structure learner, engineered to discern the structural knowledge encoded in the local graphs. With this specialized component, FedHERO enables the local model for each client to identify and learn patterns that are universally applicable across graphs with different patterns of node neighbor distributions. FedHERO not only enhances the performance of individual client models by leveraging both local and shared structural insights but also sets a new precedent in this field to effectively handle graph data with various node neighbor distribution patterns. We conduct extensive experiments to validate the superior performance of FedHERO against existing alternatives.
URL: https://openreview.net/forum?id=pHii7cWco7
---
New submissions
===============
Title: ContraDiff: Unifying Training Process of Generative and Discriminative Vision Tasks in One Diffusion Model
Abstract: Besides unprecedented ability in image generation, text-to-image diffusion models are also able to provide powerful intermediate representations that support various discriminative vision tasks. However, efficiently adapting these models to handle both generative and discriminative tasks remains largely unexplored. While some unified frameworks have been proposed to reduce the overhead of training pipelines, they often rely on computationally expensive pretraining processes and lack flexibility in adaptation. In this paper, we propose ContraDiff, a novel framework to efficiently leverage a pretrained diffusion model for both generative and discriminative tasks. Our approach focuses on unified training and parameter-efficient optimization. Our framework combines a reconstruction loss and a contrastive loss on images with varying noise levels to effectively balance generative and contrastive training. Additionally, we apply LoRA to a pre-trained Stable Diffusion model, significantly reducing training time without compromising performance. Our experiments show that ContraDiff excels in both generative and discriminative vision tasks. Our model achieves 80.1\% accuracy on ImageNet-1K classification and an FID of 5.56 for ImageNet 256$\times$256 unconditional image generation, all while requiring significantly fewer trainable parameters. This efficiency offers advantages in computational resources and enhances the model's adaptability across a range of vision tasks. The code will be released publicly upon acceptance.
URL: https://openreview.net/forum?id=Lv4VvxtzeJ
---
Title: Let Me Explain, Again: Multiplicity in Local Sufficient Explanations
Abstract: When asked to explain their decisions, humans can produce multiple complementary justifications. In contrast, several feature attribution methods for machine learning produce only one such attribution, despite the existence of multiple equally strong and succinct explanations. The explanations found by these methods thus offer an incomplete picture of model behavior. In this paper, we study the problem of explaining a machine learning model's prediction on a given input from the perspective of minimal feature subsets that are sufficient for the model's prediction, focusing on their non-uniqueness. We give a tour of perspectives on this non-uniqueness, in terms of Boolean logic, conditional independence, approximate sufficiency, and degenerate conditional feature distributions. To cope with the multiplicity of these explanations, we propose a wrapper methodology that can adapt and extend methods that find a single explanation into methods for finding multiple explanations of similar quality. Our experiments benchmark the proposed meta-algorithm, which we call Let Me Explain Again (LMEA), against two multi-explanation method baselines on synthetic and real-world multiple-instance learning problems for image classification and demonstrate the ability of LMEA to augment two single-explanation methods.
URL: https://openreview.net/forum?id=d6FMg4hozX
---
Title: End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW
Abstract: In this work, we consider learning-based applications in routing to solve a Vehicle Routing
variant characterized by stochasticity and multiple objectives. Such problems are repre-
sentative of practical settings where decision-makers have to deal with uncertainty in the
operational environment as well as multiple conflicting objectives due to different stakehold-
ers. We specifically consider travel time uncertainty. We also consider two objectives, total
travel time and route makespan, that jointly target operational efficiency and labor regula-
tions on shift length, although more/different objectives could be incorporated. Learning-
based methods offer earnest computational advantages as they can repeatedly solve problems
with limited interference from the decision-maker. We specifically focus on end-to-end deep
learning models that leverage the attention mechanism and multiple solution trajectories.
These models have seen several successful applications in routing problems. However, since
travel times are not a direct input to these models due to the large dimensions of the travel
time matrix, accounting for uncertainty is a challenge, especially in the presence of multiple
objectives. In turn, we propose a model that simultaneously addresses stochasticity and
multi-objectivity and provide a refined training mechanism for this model through scenario
clustering to reduce training time. Our results show that our model is capable of construct-
ing a Pareto Front of good quality within acceptable run times compared to three baselines.
We also provide two ablation studies to assess our model’s suitability in different settings.
URL: https://openreview.net/forum?id=Wwtb1tYnp5
---
Title: The Self-Consistent Theory of Neural Network Moments
Abstract: This paper establishes a rigorous mathematical foundation for the statistical behavior of neural network parameter and gradient moments through self-consistent equations. We prove that the logarithmic moments exhibit a universal asymptotic decomposition governed by extremal statistics. This framework is extended to construct a joint partition function that unifies parameter and gradient statistics, revealing a topological phase distinction between states of correlated and uncorrelated extrema. The theory provides exact microscopic guarantees for finite networks while capturing emergent scaling behavior in large-scale systems.
URL: https://openreview.net/forum?id=qdka8SmN7j
---