Weekly TMLR digest for Apr 20, 2025

2 views
Skip to first unread message

TMLR

unread,
Apr 20, 2025, 12:00:10 AMApr 20
to tmlr-annou...@googlegroups.com


New certifications
==================

Survey Certification: Adaptive Physics-informed Neural Networks: A Survey

Edgar Torres, Mathias Niepert

https://openreview.net/forum?id=vz5P1Kbt6t

---


Reproducibility Certification: Privacy Awareness for Information-Sharing Assistants: A Case-study on Form-filling with Contextual Integrity

Sahra Ghalebikesabi, Eugene Bagdasarian, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle

https://openreview.net/forum?id=l9rATNBB8Y

---


Survey Certification: Open Problems in Technical AI Governance

Anka Reuel, Benjamin Bucknall, Stephen Casper, Timothy Fist, Lisa Soder, Onni Aarne, Lewis Hammond, Lujain Ibrahim, Alan Chan, Peter Wills, Markus Anderljung, Ben Garfinkel, Lennart Heim, Andrew Trask, Gabriel Mukobi, Rylan Schaeffer, Mauricio Baker, Sara Hooker, Irene Solaiman, Sasha Luccioni, Nitarshan Rajkumar, Nicolas Moës, Jeffrey Ladish, David Bau, Paul Bricman, Neel Guha, Jessica Newman, Yoshua Bengio, Tobin South, Alex Pentland, Sanmi Koyejo, Mykel Kochenderfer, Robert Trager

https://openreview.net/forum?id=1nO4qFMiS0

---


Accepted papers
===============


Title: Fairness-Aware Dense Subgraph Discovery

Authors: Emmanouil Kariotakis, Nicholas D Sidiropoulos, Aritra Konar

Abstract: Dense subgraph discovery (DSD) is a key graph mining primitive with myriad applications including finding densely connected communities which are diverse in their vertex composition. In such a context, it is desirable to extract a dense subgraph that provides fair representation of the diverse subgroups that constitute the vertex set while incurring a small loss in terms of subgraph density. Existing methods for promoting fairness in DSD have important limitations - the associated formulations are NP-hard in the worst case and they do not provide flexible notions of fairness, making it non-trivial to analyze the inherent trade-off between density and fairness. In this paper, we introduce two tractable formulations for fair DSD, each offering a different notion of fairness. Our methods provide a structured and flexible approach to incorporate fairness, accommodating varying fairness levels. We introduce the fairness-induced relative loss in subgraph density as a price of fairness measure to quantify the associated trade-off. We are the first to study such a notion in the context of detecting fair dense subgraphs. Extensive experiments on real-world datasets demonstrate that our methods not only match but frequently outperform existing solutions, sometimes incurring even less than half the subgraph density loss compared to prior art, while achieving the target fairness levels. Importantly, they excel in scenarios that previous methods fail to adequately handle, i.e., those with extreme subgroup imbalances, highlighting their effectiveness in extracting fair and dense solutions.

URL: https://openreview.net/forum?id=7rqV7Cb67L

---

Title: LLM-Guided Self-Supervised Tabular Learning With Task-Specific Pre-text Tasks

Authors: Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, Jinsung Yoon

Abstract: One of the most common approaches for self-supervised representation learning is defining pre-text tasks to learn data representations.
Existing works determine pre-text tasks in a "task-agnostic'' way, without considering the forthcoming downstream tasks. This offers an advantage of broad applicability across tasks, but can also lead to a mismatch between task objectives, potentially degrading performance on downstream tasks. In this paper, we introduce TST-LLM, a framework that effectively reduces this mismatch when the natural language-based description of the downstream task is given without any ground-truth labels. TST-LLM instructs the LLM to use the downstream task's description and meta-information of data to discover features relevant to the target task. These discovered features are then treated as ground-truth labels to define "target-specific'' pre-text tasks. TST-LLM consistently outperforms contemporary baselines, such as STUNT and LFR, with win ratios of 95% and 81%, when applied to 22 benchmark tabular datasets, including binary and multi-class classification, and regression tasks.

URL: https://openreview.net/forum?id=jXcx2oAIbw

---

Title: FragFormer: A Fragment-based Representation Learning Framework for Molecular Property Prediction

Authors: Jiaxi Wang, Yaosen Min, Miao Li, Ji Wu

Abstract: Molecular representation learning is central to molecular property prediction, which is a vital component in drug discovery. Existing methods, which mainly focus on the atom-level molecular graphs, often find it challenging to directly model the relation between fragment (substructure) and function of molecules, largely due to insufficient fragment priors. In this work, we propose a molecular self-supervised learning framework \textbf{FragFormer}, which aims to learn the representation of fragments and their contextual relationships. Given the prior that an atom can be part of multiple functional groups, we develop $k$-\textbf{D}egree \textbf{Ove}rlapping fragmentation (\textbf{DOVE}), which generates overlapping fragment graph by employing the iterative line graph. Besides, DOVE can preserve the connection information during the fragmentation phase compared to non-overlapping fragmentation. In the pre-training stage, we design a \textit{nested masked fragment prediction} objective, to capture the hierarchical nature of fragments, namely that larger fragments can encompass multiple smaller ones. Based on FragFormer, we introduce a simple yet efficient \textit{fragment-level} interpretation method \textbf{FragCAM} for the molecular property prediction results with greater accuracy. Moreover, thanks to the fragment modeling, our model is more capable of processing large molecule, such as peptides, and capturing the long-range interactions inside molecules. Our approach achieves state-of-the-art (SOTA) performance on eight out of eleven molecular property prediction datasets on PharmaBench. On long-range biological benchmark with peptide data, FragFormer can beat strong baselines by a clear margin, which shows the model's potential to generalize to larger molecules. Finally, we demonstrate that our model can effectively identify decisive fragments for prediction results on a real-world dataset\footnote{Our code is available at \url{https://github.com/wjxts/FragFormer/}}.

URL: https://openreview.net/forum?id=9aiuB3kIjd

---

Title: When resampling/reweighting improves feature learning in imbalanced classification? A toy-model study

Authors: Tomoyuki Obuchi, Toshiyuki Tanaka

Abstract: A toy model of binary classification is studied with the aim of clarifying the class-wise resampling/reweighting effect on the feature learning performance under the presence of class imbalance. In the analysis, a high-dimensional limit of the input space is taken while keeping the ratio of the dataset size against the input dimension finite and the non-rigorous replica method from statistical mechanics is employed. The result shows that there exists a case in which the no resampling/reweighting situation gives the best feature learning performance irrespectively of the choice of losses or classifiers, supporting recent findings in~\citet{kang2019decoupling,cao2019learning}. It is also revealed that the key of the result is the symmetry of the loss and the problem setting. Inspired by this, we propose a further simplified model exhibiting the same property in the multiclass setting. These clarify when the class-wise resampling/reweighting becomes effective in imbalanced classification.

URL: https://openreview.net/forum?id=spqbyeGyLR

---

Title: HyperMagNet: A Magnetic Laplacian based Hypergraph Neural Network

Authors: Tatyana Benko, Martin Buck, Ilya Amburg, Stephen J. Young, Sinan Guven Aksoy

Abstract: In data science, hypergraphs are natural models for data exhibiting multi-way or group relationships in contrast to graphs which only model pairwise relationships. Nonetheless, many proposed hypergraph neural networks effectively reduce hypergraphs to undirected graphs via symmetrized matrix representations, potentially losing important multi-way or group information. We propose an alternative approach to hypergraph neural networks in which the hypergraph is represented as a non-reversible Markov chain. We use this Markov chain to construct a complex Hermitian Laplacian matrix — the magnetic Laplacian — which serves as the input to our proposed hypergraph neural network. We study $\textit{HyperMagNet}$ for the task of node classification, and demonstrate its effectiveness over graph-reduction based hypergraph neural networks.

URL: https://openreview.net/forum?id=Gdf4P7sEzE

---

Title: ODEStream: A Buffer-Free Online Learning Framework with ODE-based Adaptor for Streaming Time Series Forecasting

Authors: Futoon M. Abushaqra, Hao Xue, Yongli Ren, Flora D. Salim

Abstract: Addressing the challenges of irregularity and concept drift in streaming time series is crucial for real-world predictive modelling. Previous studies in time series continual learning often propose models that require buffering long sequences, potentially restricting the responsiveness of the inference system. Moreover, these models are typically designed for regularly sampled data, an unrealistic assumption in real-world scenarios. This paper introduces ODEStream, a novel buffer-free continual learning framework that incorporates a temporal isolation layer to capture temporal dependencies within the data. Simultaneously, it leverages the capability of neural ordinary differential equations to process irregular sequences and generate a continuous data representation, enabling seamless adaptation to changing dynamics in a data streaming scenario. Our approach focuses on learning how the dynamics and distribution of historical data change over time, facilitating direct processing of streaming sequences. Evaluations on benchmark real-world datasets demonstrate that ODEStream outperforms the state-of-the-art online learning and streaming analysis baseline models, providing accurate predictions over extended periods while minimising performance degradation over time by learning how the sequence dynamics change. The implementation of ODEStream is available at: \url{https://github.com/FtoonAbushaqra/ODEStream.git}.

URL: https://openreview.net/forum?id=TWOTKhwU5n

---

Title: Amphibian: A Meta-Learning Framework for Rehearsal-Free, Fast Online Continual Learning

Authors: Gobinda Saha, Kaushik Roy

Abstract: Online continual learning is challenging as it requires fast adaptation over a stream of data in a non-stationary environment without forgetting the knowledge acquired in the past. To address this challenge, in this paper, we introduce Amphibian - a gradient-based meta-learner that learns to scale the direction of gradient descent to achieve the desired balance between fast learning and continual learning. For this purpose, using only the current batch of data, Amphibian minimizes a meta-objective that encourages alignments of gradients among given data samples along selected basis directions in the gradient space. From this objective, it learns a diagonal scale matrix in each layer that accumulates the history of such gradient alignments. Using these scale matrices Amphibian updates the model online only in the directions having positive cumulative gradient alignments among the data observed so far. With evaluation on standard continual image classification benchmarks, we show that such meta-learned scaled gradient descent in Amphibian achieves better accuracy in online continual learning than relevant baselines while enabling fast learning with less data and few-shot knowledge transfer to new tasks. We also introduce Amphibian-$\beta$ a unified and principled framework for analyzing and understanding the fast learning and continual learning dynamics. Additionally, with loss landscape visualizations, we show such gradient updates incur minimum loss to the old task enabling fast continual learning in Amphibian.

URL: https://openreview.net/forum?id=n4AaKOBWbB

---

Title: Sample-efficient decoding of visual stimuli from fMRI through inter-individual functional alignment

Authors: Alexis Thual, Yohann Benchetrit, Felix Geilert, Jérémy Rapin, Iurii Makarov, Stanislas Dehaene, Bertrand Thirion, Hubert Banville, Jean-Remi King

Abstract: Deep learning is leading to major advances in the realm of brain decoding from functional Magnetic Resonance Imaging (fMRI). However, the large inter-individual variability in brain characteristics has constrained most studies to train models on one participant at a time. This limitation hampers the training of deep learning models, which typically requires very large datasets. Here, we propose to boost brain decoding of videos and static images across participants by aligning brain responses of training and left-out participants. Evaluated on a retrieval task, compared to the anatomically-aligned baseline, our method halves the median rank in out-of-subject setups. It also outperforms classical within-subject approaches when fewer than 100 minutes of data is available for the tested participant. Furthermore, we show that our alignment framework handles multiple subjects, which improves accuracy upon classical single-subject approaches. Finally, we show that this method aligns neural representations in accordance with brain anatomy. Overall, this study lays the foundations for leveraging extensive neuroimaging datasets and enhancing the decoding of individual brains when a limited amount of brain-imaging data is available.

URL: https://openreview.net/forum?id=qvJraN50DT

---

Title: LLM-Select: Feature Selection with Large Language Models

Authors: Daniel P Jeong, Zachary Chase Lipton, Pradeep Kumar Ravikumar

Abstract: In this paper, we demonstrate a surprising capability of large language models (LLMs): given only input feature names and a description of a prediction task, they are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Remarkably, these models exhibit this capacity across various query mechanisms. For example, we zero-shot prompt an LLM to output a numerical importance score for a feature (e.g., ``blood pressure'') in predicting an outcome of interest (e.g., ``heart failure''), with no additional context. In particular, we find that the latest models, such as GPT-4, can consistently identify the most predictive features regardless of the query mechanism and across various prompting strategies. We illustrate these findings through extensive experiments on real-world data, where we show that LLM-based feature selection consistently achieves strong performance competitive with data-driven methods such as the LASSO, despite never having looked at the downstream training data. Our findings suggest that LLMs may be useful not only for selecting the best features for training \textit{but also for deciding which features to collect in the first place}. This could potentially benefit practitioners in domains like healthcare and the social sciences, where collecting high-quality data comes at a high cost.

URL: https://openreview.net/forum?id=16f7ea1N3p

---

Title: Reproducibility Study of "Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation"

Authors: Jose L. Garcia, Karolina Hajkova, Maria Marchenko, Carlos Miguel Patiño

Abstract: This paper presents a reproducibility study and extension of "Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation." We validate the original findings using a range of open-weight models (1.5B-70B parameters), GPT-4, and GPT-4o Mini while introducing several novel contributions. We analyze the Pareto front of the games, propose a communication-free baseline to test whether successful negotiations are possible without agent interaction, evaluate recent small language models' performance, analyze structural information leakage in model responses, and implement an inequality metric to assess negotiation fairness. Our results demonstrate that smaller models (<10B parameters) struggle with format adherence and coherent responses, but larger open-weight models can approach proprietary model performance. Additionally, in many scenarios, single-agent approaches can achieve comparable results to multi-agent negotiations, challenging assumptions about the necessity of agent communication to perform well on the benchmark. This work also provides insights into accessibility, fairness, environmental impact, and privacy considerations of LLM-based negotiation systems.

URL: https://openreview.net/forum?id=MTrhFmkC45

---

Title: Change Point Detection in Dynamic Graphs with Decoder-only Latent Space Model

Authors: Yik Lun Kei, Jialiang Li, Hangjian Li, Yanzhen Chen, OSCAR HERNAN MADRID PADILLA

Abstract: This manuscript studies the unsupervised change point detection problem in time series of graphs using a decoder-only latent space model. The proposed framework consists of learnable prior distributions for low-dimensional graph representations and of a decoder that bridges the observed graphs and latent representations. The prior distributions of the latent spaces are learned from the observed data as empirical Bayes to assist change point detection. Specifically, the model parameters are estimated via maximum approximate likelihood, with a Group Fused Lasso regularization imposed on the prior parameters. The augmented Lagrangian is solved via Alternating Direction Method of Multipliers, and Langevin Dynamics are recruited for posterior inference. Simulation studies show good performance of the latent space model in supporting change point detection and real data experiments yield change points that align with significant events.

URL: https://openreview.net/forum?id=DVeFqV56Iz

---

Title: Adaptive Physics-informed Neural Networks: A Survey

Authors: Edgar Torres, Mathias Niepert

Abstract: Physics-informed neural networks (PINNs) have emerged as a promising approach for solving partial differential equations (PDEs) using neural networks, particularly in data-scarce scenarios due to their unsupervised training capability. However, a key limitation is the
need for re-optimization with each change in PDE parameters, similar to the challenge in traditional numerical methods where each system of equations corresponds to a specific PDE instance. This characteristic poses a barrier to the widespread adoption of PINNs
across scientific and engineering applications. This survey explores research addressing this limitation through transfer learning and meta-learning, synthesizing insights to establish a foundation for efficient data generation strategies tailored to PINNs. These methods can potentially improve PINNs’ training efficiency, enabling quicker adaptation to new PDEs with fewer data and computational demands. While numerical methods directly solve systems of equations to derive solutions, neural networks implicitly learn solutions by adjusting their parameters. One notable advantage of neural networks lies in their capacity to abstract away from specific problem domains, enabling them to retain, discard, or adapt learned representations to efficiently address similar problems. By understanding how these techniques can be applied to PINNs, this survey seeks to identify promising directions for future research to enable the widespread adoption of PINNs across a wide range of scientific and engineering applications.

URL: https://openreview.net/forum?id=vz5P1Kbt6t

---

Title: Design Editing for Offline Model-based Optimization

Authors: Ye Yuan, Youyuan Zhang, Can Chen, Haolun Wu, Melody Zixuan Li, Jianmo Li, James J. Clark, Xue Liu

Abstract: Offline model-based optimization (MBO) aims to maximize a black-box objective function using only an offline dataset of designs and scores. These tasks span various domains, such as robotics, material design, and protein and molecular engineering. A common approach involves training a surrogate model using existing designs and their corresponding scores, and then generating new designs through gradient-based updates with respect to the surrogate model. This method suffers from the out-of-distribution issue, where the surrogate model may erroneously predict high scores for unseen designs. To address this challenge, we introduce a novel method, Design Editing for Offline Model-based Optimization} (DEMO), which leverages a diffusion prior to calibrate overly optimized designs. DEMO first generates pseudo design candidates by performing gradient ascent with respect to a surrogate model. While these pseudo design candidates contain information beyond the offline dataset, they might be invalid or have erroneously high predicted scores. Therefore, to address this challenge while utilizing the information provided by pseudo design candidates, we propose an editing process to refine these pseudo design candidates. We introduce noise to the pseudo design candidates and subsequently denoise them with a diffusion prior trained on the offline dataset, ensuring they align with the distribution of valid designs. Empirical evaluations on seven offline MBO tasks show that, with properly tuned hyperparamters, DEMO's score is competitive with the best previously reported scores in the literature.

URL: https://openreview.net/forum?id=OPFnpl7KiF

---

Title: Referential communication in heterogeneous communities of pre-trained visual deep networks

Authors: Matéo Mahaut, Roberto Dessi, Francesca Franzon, Marco Baroni

Abstract: As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes.
As a first step in this direction, we systematically explore the task of referential communication in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects.

URL: https://openreview.net/forum?id=8L3khbpUJL

---

Title: MemLLM: Finetuning LLMs to Use Explicit Read-Write Memory

Authors: Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, Mohsen Fayyaz, Hinrich Schuetze

Abstract: While current large language models (LLMs) perform well on many knowledge-related tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with memorizing rare events and with updating their memory as facts change over time. In addition, the uninterpretable nature of parametric memory makes it challenging to prevent hallucination. Model editing and augmenting LLMs with parameters specialized for memory are only partial solutions. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation. The project repository is publicly available at: https://github.com/amodaresi/MemLLM

URL: https://openreview.net/forum?id=dghM7sOudh

---

Title: Accelerating Non-Conjugate Gaussian Processes By Trading Off Computation For Uncertainty

Authors: Lukas Tatzel, Jonathan Wenger, Frank Schneider, Philipp Hennig

Abstract: Non-conjugate Gaussian processes (NCGPs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in NCGPs is prohibitively expensive for large datasets, thus requiring approximations in practice. The approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. We introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for NCGPs. As we demonstrate on large-scale classification problems, our method significantly accelerates posterior inference compared to competitive baselines by trading off reduced computation for increased uncertainty.

URL: https://openreview.net/forum?id=UdcF3JbSKb

---

Title: Optimal Embedding Guided Negative Sample Generation for Knowledge Graph Link Prediction

Authors: Makoto Takamoto, Daniel Onoro Rubio, Wiem Ben Rim, Takashi Maruyama, Bhushan Kotnis

Abstract: Knowledge graph embedding (KGE) models encode the structural information of knowledge graphs to predicting new links.
Effective training of these models requires distinguishing between positive and negative samples with high precision.
Although prior research has shown that improving the quality of negative samples can significantly enhance model accuracy, identifying high-quality negative samples remains a challenging problem.
This paper theoretically investigates the condition under which negative samples lead to optimal KG embedding and identifies a sufficient condition for an effective negative sample distribution. Based on this theoretical foundation, we propose \textbf{E}mbedding \textbf{MU}tation (\textsc{EMU}), a novel framework that \emph{generates} negative samples satisfying this condition, in contrast to conventional methods that focus on \emph{identifying} challenging negative samples within the training data.
Importantly, the simplicity of \textsc{EMU} ensures seamless integration with existing KGE models and negative sampling methods.
To evaluate its efficacy, we conducted comprehensive experiments across multiple datasets. The results consistently demonstrate significant improvements in link prediction performance across various KGE models and negative sampling methods. Notably, \textsc{EMU} enables performance improvements comparable to those achieved by models with embedding dimension five times larger.
An implementation of the method and experiments are available at \url{https://github.com/nec-research/EMU-KG}.

URL: https://openreview.net/forum?id=B4SyciDyIh

---

Title: SE3Set: Harnessing Equivariant Hypergraph Neural Networks for Molecular Representation Learning

Authors: Hongfei Wu, Lijun Wu, Guoqing Liu, Zhirong Liu, Bin Shao, Zun Wang

Abstract: In this paper, we develop SE3Set, an SE(3) equivariant hypergraph neural network architecture tailored for advanced molecular representation learning. Hypergraphs are not merely an extension of traditional graphs; they are pivotal for modeling high-order relationships, a capability that conventional equivariant graph-based methods lack due to their inherent limitations in representing intricate many-body interactions. To achieve this, we first construct hypergraphs by proposing a new fragmentation method that considers both chemical and three-dimensional spatial information of the molecular system. We then design SE3Set, which incorporates equivariance into the hypergraph neural network. This ensures that the learned molecular representations are invariant to spatial transformations, thereby providing robustness essential for the accurate prediction of molecular properties. SE3Set has shown performance on par with state-of-the-art (SOTA) models for small molecule datasets like QM9 and MD17. It demonstrates outstanding performance on the MD22 dataset, achieving a remarkable ~20\% improvement in accuracy across all molecules. Furthermore, on the OE62 dataset, SE3Set outperforms all short-range models. We also conducted a detailed analysis of OE62, highlighting the prevalence of complex many-body interactions in large molecules. This exceptional performance of SE3Set across diverse molecular structures underscores its transformative potential in computational chemistry, offering a route to more accurate and physically nuanced modeling. The code of this work is available at https://github.com/Navantock/SE3Set.

URL: https://openreview.net/forum?id=muWEt1TOyo

---

Title: Oblique Bayesian Additive Regression Trees

Authors: Paul-Hieu V. Nguyen, Ryan Yee, Sameer Deshpande

Abstract: Current implementations of Bayesian Additive Regression Trees (BART) are based on axis-aligned decision rules that recursively partition the feature space using a single feature at a time. Several authors have demonstrated that oblique trees, whose decision rules are based on linear combinations of features, can sometimes yield better predictions than axis-aligned trees and exhibit excellent theoretical properties. We develop an oblique version of BART that leverages a data-adaptive decision rule prior that recursively partitions the feature space along random hyperplanes. Using several synthetic and real-world benchmark datasets, we systematically compared our oblique BART implementation to axis-aligned BART and other tree ensemble methods, finding that oblique BART was competitive with --- and sometimes much better than --- those methods.

URL: https://openreview.net/forum?id=l4Qnj4tHBx

---

Title: Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift

Authors: RENCHUNZI XIE, Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko, Jianfeng Zhang, Bo An

Abstract: Estimating the test performance of a model, possibly under distribution shift, without having access to the ground-truth labels is a challenging, yet very important problem for the safe deployment of machine learning algorithms in the wild. Existing works mostly rely on information from either the outputs or the extracted features of neural networks to estimate a score that correlates with the ground-truth test accuracy. In this paper, we investigate -- both empirically and theoretically -- how the information provided by the gradients can be predictive of the ground-truth test accuracy even under distribution shifts. More specifically, we use the norm of classification-layer gradients, backpropagated from the cross-entropy loss after only one gradient step over test data. Our intuition is that these gradients should be of higher magnitude when the model generalizes poorly. We provide the theoretical insights behind our approach and the key ingredients that ensure its empirical success. Extensive experiments conducted with various architectures on diverse distribution shifts demonstrate that our method significantly outperforms current state-of-the-art approaches. The code is available at \url{https://github.com/Renchunzi-Xie/GdScore}.

URL: https://openreview.net/forum?id=FIWHRSuoos

---

Title: Scaling Laws for Predicting Downstream Performance in LLMs

Authors: Yangyi Chen, Binxuan Huang, Yifan Gao, Zhengyang Wang, Jingfeng Yang, Heng Ji

Abstract: Precise estimation of downstream performance in large language models (LLMs) prior to training is essential for guiding their development process. Scaling laws analysis utilizes the statistics of a series of significantly smaller sampling language models (LMs) to predict the performance of the target LLM. For downstream performance prediction, the critical challenge lies in the emergent abilities in LLMs that occur beyond task-specific computational thresholds. In this work, we focus on the pre-training loss as a more computation-efficient metric for performance estimation. Our two-stage approach FLP consists of first estimating a function that maps computational resources (e.g., FLOPs) to the pre-training Loss using a series of sampling models, followed by mapping the pre-training loss to downstream task Performance after the critical "emergent phase". In our experiments, this FLP solution accurately predicts the performance of LLMs with 7B and 13B parameters using a series of sampling LMs up to 3B, achieving error margins of 5% and 10%, respectively, and significantly outperforming the FLOPs-to-Performance approach. Further, we present FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training, specifically blending general corpus with code data to accurately represent the common necessity. FLP-M extends the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources, and employs a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance. By utilizing a 3B LLM trained on a specific ratio and a series of smaller sampling LMs, FLP-M can effectively forecast the performance of 3B and 7B LLMs across various data mixtures for most benchmarks within 10% error margins.

URL: https://openreview.net/forum?id=PJUbMDkQVY

---

Title: Privacy Awareness for Information-Sharing Assistants: A Case-study on Form-filling with Contextual Integrity

Authors: Sahra Ghalebikesabi, Eugene Bagdasarian, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle

Abstract: Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-sharing assistants to behave in accordance with privacy expectations, we propose to operationalize the design of privacy-conscious assistants that conform with *contextual integrity* (CI), a framework that equates privacy with the appropriate flow of information in a given context. In particular, we design and evaluate a number of strategies to steer assistants' information-sharing actions to be CI compliant. Our evaluation is based on a novel form filling benchmark composed of human annotations of common webform applications, and it reveals that prompting frontier LLMs to perform CI-based reasoning yields strong results.

URL: https://openreview.net/forum?id=l9rATNBB8Y

---

Title: EDM-TTS: Efficient Dual-Stage Masked Modeling for Alignment-Free Text-to-Speech Synthesis

Authors: Nabarun Goswami, Hanqin Wang, Tatsuya Harada

Abstract: Tokenized speech modeling has significantly advanced zero-shot text-to-speech (TTS) capabilities. The most de facto approach involves a dual-stage process: text-to-semantic (T2S) followed by semantic-to-acoustic (S2A) generation.
Several auto-regressive (AR) and non-autoregressive (NAR) methods have been explored in literature for both the stages. While AR models achieve state-of-the-art performance, its token-by-token generation causes inference inefficiencies, while NAR methods while being more efficient, require explicit alignment for upsampling intermediate representations, which constrains the model's capability for more natural prosody.
To overcome these issues, we propose an **E**fficient **D**ual-stage **M**asked **TTS** (EDM-TTS) model that employs an alignment-free masked generative approach for the T2S stage that overcomes the constrains of an explicit aligner, while retaining the efficiency of NAR methods. For the S2A stage, we introduce an innovative NAR approach using a novel Injection Conformer architecture, that effectively models the conditional dependence among different acoustic quantization levels, optimized by a masked language modeling objective, enabling zero-shot speech generation.
Our evaluations demonstrated not only the superior inference efficiency of EDM-TTS, but also its state-of-the-art high-quality zero-shot speech quality, naturalness and speaker similarity.

URL: https://openreview.net/forum?id=c7vkDg558Z

---

Title: Formal Verification of Graph Convolutional Networks with Uncertain Node Features and Uncertain Graph Structure

Authors: Tobias Ladner, Michael Eichelbeck, Matthias Althoff

Abstract: Graph neural networks are becoming increasingly popular in the field of machine learning due to their unique ability to process data structured in graphs.
They have also been applied in safety-critical environments where perturbations inherently occur.
However, these perturbations require us to formally verify neural networks before their deployment in safety-critical environments
as neural networks are prone to adversarial attacks.
While there exists research on the formal verification of neural networks,
there is no work verifying the robustness of generic graph convolutional network architectures with uncertainty in the node features and in the graph structure over multiple message-passing steps.
This work addresses this research gap by explicitly preserving the non-convex dependencies of all elements in the underlying computations through reachability analysis with (matrix) polynomial zonotopes.
We demonstrate our approach on three popular benchmark datasets.

URL: https://openreview.net/forum?id=B6y12Ot0cP

---

Title: An Adversarial Perspective on Machine Unlearning for AI Safety

Authors: Jakub Łucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tramèr, Javier Rando

Abstract: Large language models are finetuned to refuse questions about hazardous knowledge, but these protections can often be bypassed. Unlearning methods aim at completely removing hazardous capabilities from models and make them inaccessible to adversaries. This work challenges the fundamental differences between unlearning and traditional safety post-training from an adversarial perspective. We demonstrate that existing jailbreak methods, previously reported as ineffective against unlearning, can be successful when applied carefully. Furthermore, we develop a variety of adaptive methods that recover most supposedly unlearned capabilities. For instance, we show that finetuning on 10 unrelated examples or removing specific directions in the activation space can recover most hazardous capabilities for models edited with RMU, a state-of-the-art unlearning method. Our findings challenge the robustness of current unlearning approaches and question their advantages over safety training.

URL: https://openreview.net/forum?id=J5IRyTKZ9s

---

Title: Bayesian Transferability Assessment for Spiking Neural Networks

Authors: Haiqing Hao, Wenhui Wang

Abstract: Brain-inspired spiking neural networks (SNNs) attract broad interest in neuromorphic computing but suffer the problem of being difficult to optimize. Concurrently, pre-trained models (PTMs) have become a foundation for developing and applying artificial intelligence. Therefore, it is expected that pre-trained SNNs can alleviate the optimization difficulty of training from scratch. However, with a lot of PTMs available in the model hubs, effectively selecting the most appropriate PTM for a given task remains a significant challenge, often necessitating exhaustive fine-tuning and grid-searching. While several solutions to this challenge have been proposed for the mainstream artificial neural network (ANNs), aimed at developing efficient methods to assess the transferability of PTMs on target tasks, the realm of SNNs remains unexplored. The currently most used transferability assessment method for ANNs predicts transferability in a Bayesian perspective. Feature maps extracted by the PTM backbone on the target task are used to calculate the maximum model evidence as the indicator of transferability. However, ANNs and SNNs differ in architecture, rendering the existing Bayesian method incompatible with SNNs. To solve this problem, this paper introduces a novel approach to using the feature maps averaged over the time domain to calculate maximum evidence. Our proposed $\textbf{M}$aximum $\textbf{E}$vidence method with $\textbf{A}$veraged $\textbf{F}$eatures (MEAF) demonstrates effectiveness for SNNs. Additionally, the current algorithm calculates maximum evidence in an iterative way. To accelerate the selection of PTMs, an approximation method is proposed to avoid iteration in the calculation of maximum evidence, significantly reducing time consumption. It is shown through experiment that the proposed MEAF method is effective for the transferability assessment of SNNs. MEAF outperforms information theory-based assessment methods such as LEEP and NCE, which can directly adapt to SNNs on neuromorphic datasets, underscoring its potential to streamline PTM selection and application in the realm of SNNs.

URL: https://openreview.net/forum?id=GaUtrgXMHe

---

Title: Relative Phase Equivariant Deep Neural Systems for Physical Layer Communications

Authors: Arwin Gansekoele, Sandjai Bhulai, Mark Hoogendoorn, Rob van der Mei

Abstract: In the era of telecommunications, the increasing demand for complex and specialized communication systems has led to a focus on improving physical layer communications. Artificial intelligence (AI) has emerged as a promising solution avenue for doing so. Deep neural receivers have already shown significant promise in improving the performance of communications systems. However, a major challenge lies in developing deep neural receivers that match the energy efficiency and speed of traditional receivers. This work investigates the incorporation of inductive biases in the physical layer using group-equivariant deep learning to improve the parameter efficiency of deep neural receivers. We do so by constructing a deep neural receiver that is equivariant with respect to the phase of arrival. We show that the inclusion of relative phase equivariance significantly reduces the error rate of deep neural receivers at similar model sizes. Thus, we show the potential of group-equivariant deep learning in the domain of physical layer communications.

URL: https://openreview.net/forum?id=vttqWoSJIW

---

Title: Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

Authors: Diogo S. Carvalho, Pedro A. Santos, Francisco S. Melo

Abstract: We investigate the convergence of $Q$-learning with linear function approximation and introduce the multi-Bellman operator, an extension of the traditional Bellman operator. By analyzing the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes a contraction, yielding stronger fixed-point guarantees compared to the original Bellman operator. Building on these insights, we propose the multi-$Q$-learning algorithm, which achieves convergence and approximates the optimal solution with arbitrary precision. This contrasts with traditional $Q$-learning, which lacks such convergence guarantees. Finally, we empirically validate our theoretical results.

URL: https://openreview.net/forum?id=D2PjEPGXgh

---

Title: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Authors: Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas

Abstract: Despite efforts to align large language models (LLMs) with human intentions, widely-used LLMs such as GPT, Llama, and Claude are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, an algorithm designed to mitigate jailbreaking attacks. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. Across a range of popular LLMs, SmoothLLM offers improved robustness against the GCG, PAIR, RandomSearch, and AmpleGCG jailbreaks. SmoothLLM is also resistant against adaptive GCG attacks, exhibits a small, though non-negligible trade-off between robustness and nominal performance, and is compatible with any LLM.

URL: https://openreview.net/forum?id=laPAh2hRFC

---

Title: Open Problems in Technical AI Governance

Authors: Anka Reuel, Benjamin Bucknall, Stephen Casper, Timothy Fist, Lisa Soder, Onni Aarne, Lewis Hammond, Lujain Ibrahim, Alan Chan, Peter Wills, Markus Anderljung, Ben Garfinkel, Lennart Heim, Andrew Trask, Gabriel Mukobi, Rylan Schaeffer, Mauricio Baker, Sara Hooker, Irene Solaiman, Sasha Luccioni, Nitarshan Rajkumar, Nicolas Moës, Jeffrey Ladish, David Bau, Paul Bricman, Neel Guha, Jessica Newman, Yoshua Bengio, Tobin South, Alex Pentland, Sanmi Koyejo, Mykel Kochenderfer, Robert Trager

Abstract: AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technical analysis and tools for supporting the effective governance of AI, seeks to address such challenges. It can help to (a) identify areas where intervention is needed, (b) assess the efficacy of potential governance actions, and (c) enhance governance options by designing mechanisms for enforcement, incentivization, or compliance. In this paper, we explain what technical AI governance is, outline why it is important, and present a taxonomy and incomplete catalog of its open problems. This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance.

URL: https://openreview.net/forum?id=1nO4qFMiS0

---

Title: Reward Distance Comparisons Under Transition Sparsity

Authors: Clement Nyanhongo, Bruno Miranda Henrique, Eugene Santos

Abstract: Reward comparisons are vital for evaluating differences in agent behaviors induced by a set of reward functions. Most conventional techniques utilize the input reward functions to learn optimized policies, which are then used to compare agent behaviors. However, learning these policies can be computationally expensive and can also raise safety concerns. Direct reward comparison techniques obviate policy learning but suffer from transition sparsity, where only a small subset of transitions are sampled due to data collection challenges and feasibility constraints. Existing state-of-the-art direct reward comparison methods are ill-suited for these sparse conditions since they require high transition coverage, where the majority of transitions from a given coverage distribution are sampled. When this requirement is not satisfied, a distribution mismatch between sampled and expected transitions can occur, leading to significant errors. This paper introduces the Sparsity Resilient Reward Distance (SRRD) pseudometric, designed to eliminate the need for high transition coverage by accommodating diverse sample distributions, which are common under transition sparsity. We provide theoretical justification for SRRD's robustness and conduct experiments to demonstrate its practical efficacy across multiple domains.

URL: https://openreview.net/forum?id=haP586YomL

---

Title: Reinforcement Learning for Causal Discovery without Acyclicity Constraints

Authors: Bao Duong, Hung Le, Biwei Huang, Thin Nguyen

Abstract: Recently, reinforcement learning (RL) has proved a promising alternative for conventional local heuristics in score-based approaches to learning directed acyclic causal graphs (DAGs) from observational data. However, the intricate acyclicity constraint still challenges the efficient exploration of the vast space of DAGs in existing methods. In this study, we introduce ALIAS (reinforced dAg Learning wIthout Acyclicity conStraints), a novel approach to causal discovery powered by the RL machinery. Our method features an efficient policy for generating DAGs in just a single step with an optimal quadratic complexity, fueled by a novel parametrization of DAGs that directly translates a continuous space to the space of all DAGs, bypassing the need for explicitly enforcing acyclicity constraints. This approach enables us to navigate the search space more effectively by utilizing policy gradient methods and established scoring functions. In addition, we provide compelling empirical evidence for the strong performance of ALIAS in comparison with state-of-the-arts in causal discovery over increasingly difficult experiment conditions on both synthetic and real datasets. Our implementation is provided at https://github.com/baosws/ALIAS.

URL: https://openreview.net/forum?id=sNzBi8rZTy

---

Title: Federated Spectral Graph Transformers Meet Neural Ordinary Differential Equations for Non-IID Graphs

Authors: Kishan Gurumurthy, Himanshu Pal, Charu Sharma

Abstract: Graph Neural Network (GNN) research is rapidly advancing due to GNNs’ capacity to learn distributed representations from graph-structured data. However, centralizing large volumes of real-world graph data for GNN training is often impractical due to privacy concerns, regulatory restrictions, and commercial competition. Federated learning (FL), a distributed learning paradigm, offers a solution by preserving data privacy with collaborative model training. Despite progress in training huge vision and language models, federated learning for GNNs remains underexplored. To address this challenge, we present a novel method for federated learning on GNNs based on spectral GNNs equipped with neural ordinary differential equations (ODE) for better information capture, showing promising results across both homophilic and heterophilic graphs. Our approach effectively handles non-Independent and Identically Distributed (non-IID) data, while also achieving performance comparable to existing methods that only operate on IID data. It is designed to be privacy-preserving and bandwidth-optimized, making it suitable for real-world applications such as social network analysis, recommendation systems, and fraud detection, which often involve complex, non-IID, and heterophilic graph structures. Our results in the area of federated learning on non-IID heterophilic graphs demonstrate significant improvements, while also achieving better performance on homophilic graphs. This work highlights the potential of federated learning in diverse and challenging graph settings.

URL: https://openreview.net/forum?id=TR6iUG8i6Z

---


New submissions
===============


Title: Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

Abstract: Learning based methods are now ubiquitous for solving inverse problems, but their deployment in real-world applications is often hindered by the lack of ground truth references for training. Recent self-supervised learning strategies offer a promising alternative, avoiding the need for ground truth. However, most existing methods are limited to linear inverse problems. This work extends self-supervised learning to the non-linear problem of recovering audio and images from clipped measurements, by assuming that the signal distribution is approximately invariant to changes in amplitude. We provide sufficient conditions for learning to reconstruct from saturated signals alone and a self supervised loss that can be used to train reconstruction networks. Experiments on both audio and image data show that the proposed approach performs on par with fully supervised approaches, despite relying solely on clipped measurements for training.

URL: https://openreview.net/forum?id=gwDNM3b353

---

Title: G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving

Abstract: Recent literature has effectively leveraged diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete compressed representations, such as image and motion generation. However, their discrete and non-differentiable nature has limited their application to inverse problems formulated in continuous spaces. This paper presents a novel method for addressing linear inverse problems by leveraging generative models based on discrete diffusion as priors. We overcome these limitations by approximating the true posterior distribution with a variational distribution constructed from categorical distributions and continuous relaxation techniques. Furthermore, we employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states, demonstrating that our method performs comparably to continuous diffusion techniques with less GPU memory consumption.

URL: https://openreview.net/forum?id=fj23qnVifX

---

Title: On the Low-Rank Parametrization of Reward Models for Controlled Language Generation

Abstract: Language models trained on large amounts of data are known to produce inappropriate content in some cases and require careful tuning to be used in the real world. We revisit an effective and modular approach for controllability of the language models, when an external expert model guides the decoding. Particularly, we zoom in into the parametrization choice of an external expert, highlighting the difference between low-rank and higher-rank parametrizations. Higher-rank experts are designed to support high flexibility when representing the rewards, leading to higher computational costs during decoding. However, we demonstrate that they might not use their full flexibility. By analyzing the recently proposed reward-augmented decoding approach (RAD), which uses a higher-rank expert model, we introduce a simpler but more efficient low-rank parametrization of the expert model enabling fast and effective guided decoding. We empirically show that the low-rank RAD performs on par with the more flexible RAD on a detoxification and a sentiment control task, while requiring only a single reward model call per generated token.

URL: https://openreview.net/forum?id=cjRsEGLT8B

---

Title: GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Abstract: In this work, we propose GLOV, which enables Large Language Models (LLMs) to act as implicit optimizers for Vision-Language Models (VLMs) to enhance downstream vision tasks. GLOV prompts an LLM with the downstream task description, querying it for suitable VLM prompts (\eg for zero-shot classification with CLIP). These prompts are ranked according to their fitness for the downstream vision task. In each respective optimization step, the ranked prompts are fed as in-context examples (with their accuracies) to equip the LLM with the knowledge of the type of prompts preferred by the downstream VLM. Furthermore, we explicitly guide the LLM's generation at each optimization step by adding an offset vector -- calculated from the embedding differences between previous \textit{positive} and \textit{negative} solutions -- to the intermediate layer of the network for the next generation. This offset vector biases the LLM generation toward the type of language the downstream VLM prefers, resulting in enhanced performance on the downstream vision tasks. We comprehensively evaluate our GLOV on two tasks: object recognition and the critical task of enhancing VLM safety. Our GLOV shows performance improvement by up to $15.0\%$ and $57.5\%$ for dual-encoder (\eg~CLIP) and encoder-decoder (\eg~\llava) models for object recognition and reduces the attack success rate (ASR) on state-of-the-art VLMs by up to $60.7\%$.

URL: https://openreview.net/forum?id=kZLANTp6Vw

---

Title: Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?

Abstract: Model Inversion (MI) attacks pose a significant privacy threat by reconstructing private training data from machine learning models.
While existing defenses primarily concentrate on model-centric approaches, the impact of data on MI robustness remains largely unexplored.
In this work, we explore Random Erasing (RE)—a technique traditionally used for improving model generalization under occlusion—and uncover its surprising effectiveness as a defense against MI attacks.

Specifically, our novel feature space analysis shows that model trained with RE-images introduces a significant discrepancy between the features of MI-reconstructed images and those of the private data. At the same time, features of private images remain distinct from other classes and well-separated from different classification regions. These effects collectively degrade MI reconstruction quality and attack accuracy while maintaining reasonable natural accuracy. Furthermore, we explore two critical properties of RE including Partial Erasure and Random Location. First, Partial Erasure prevents the model from observing entire objects during training, and we find that this has significant impact on MI, which aims to reconstruct the entire objects. Second, the Random Location of erasure plays a crucial role in achieving a strong privacy-utility trade-off. Our findings highlight RE as a simple yet effective defense mechanism that can be easily integrated with existing privacy-preserving techniques. Extensive experiments of 37 setups demonstrate that our method achieves SOTA performance in privacy-utility tradeoff. The results consistently demonstrate the superiority of our defense over existing defenses across different MI attacks, network architectures, and attack configurations. For the first time, we achieve significant degrade in attack accuracy without decrease in utility for some configurations. Our code and additional results are included in Supplementary.

URL: https://openreview.net/forum?id=S9CwKnPHaO

---

Title: Interpretable LLM-based Table Question Answering

Abstract: Interpretability in Table Question Answering (Table QA) is critical, especially in high-stakes domains like finance and healthcare. While recent Table QA approaches based on Large Language Models (LLMs) achieve high accuracy, they often produce ambiguous explanations of how answers are derived. We propose Plan-of-SQLs (POS), a new Table QA method that makes the model's decision-making process interpretable. POS decomposes a question into a sequence of atomic steps, each directly translated into an executable SQL command on the table, thereby ensuring that every intermediate result is transparent. Through extensive experiments, we show that: First, POS generates the highest-quality explanations among compared methods, which markedly improves the users' ability to simulate and verify the model’s decisions. Second, when evaluated on standard Table QA benchmarks (TabFact, WikiTQ, and FeTaQA), POS achieves QA accuracy that is competitive to existing methods, while also offering greater efficiency—requiring significantly fewer LLM calls and table database queries (up to 25x fewer)—and more robust performance on large-sized tables. Finally, we observe high agreement (up to 90.59% in forward simulation) between LLMs and human users when making decisions based on the same explanations, suggesting that LLMs could serve as an effective proxy for humans in evaluating Table QA explanations.

URL: https://openreview.net/forum?id=2eTsZBoU2W

---

Title: L2G: Repurposing Language Models for Genomics Tasks

Abstract: Pre-trained language models have transformed the field of natural language processing (NLP), and their success has inspired efforts in genomics to develop domain-specific foundation models (FMs). However, creating high-quality genomic FMs from scratch is resource-intensive, requiring significant computational power and high-quality pre-training data. The success of large language models (LLMs) in NLP has largely been driven by industrial-scale efforts leveraging vast, diverse corpora and massive computing infrastructure. In this work, we aim to bypass the data and computational bottlenecks of creating genomic FMs from scratch and instead propose repurposing existing LLMs for genomics tasks. Inspired by the recently observed 'cross-modal transfer' phenomenon -- where transformers pre-trained on natural language can generalize to other modalities -- we introduce L2G, which adapts a pre-trained LLM architecture for genomics using neural architecture search and a novel three-stage training procedure. Remarkably, without requiring extensive pre-training on DNA sequence data, L2G achieves superior performance to fine-tuned genomic FMs and task-specific models on more than half of tasks across multiple genomics benchmarks. In an enhancer activity prediction task, L2G further demonstrates its capacity to identify significant transcription factor motifs. Our work not only highlights the generalizability and efficacy of language models in out-of-domain tasks such as genomics, but also opens new avenues for more efficient and less resource-intensive methodologies in genomic research.

URL: https://openreview.net/forum?id=5NM4guc90N

---

Title: Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

Abstract: Following AI scaling trends, frontier models continue to grow in size and continue to be trained on larger datasets. Training these models requires huge investments in exascale computational resources, which has in turn driven developtment of distributed deep learning methods. Data parallelism is an essential approach to speed up training, but it requires frequent global communication between workers, which can bottleneck training at the largest scales. In this work, we propose a method called Pseudo-Asynchronous Local SGD (PALSGD) to improve the efficiency of data-parallel training. PALSGD is an extension of Local SGD (Stich, 2018) and DiLoCo (Douillard et al., 2023), designed to further reduce communication frequency by introducing a pseudo-synchronization mechanism. PALSGD allows the use of longer synchronization intervals compared to standard Local SGD. Despite the reduced communication frequency, the pseudo-synchronization approach ensures that model consistency is maintained, leading to performance results comparable to those achieved with more frequent synchronization. Furthermore, we provide a theoretical analysis of PALSGD, establishing its convergence and deriving its convergence rate. This analysis offers insights into the algorithm's behavior and performance guarantees. We evaluated PALSGD on image classification and language modeling tasks. Our results show that PALSGD achieves better performance in less time compared to existing methods like Distributed Data Parallel (DDP), and DiLoCo. Notably, PALSGD trains 18.4% faster than DDP on ImageNet-1K with ResNet-50, 24.4% faster than DDP on TinyStories with GPT-Neo-125M, and 21.1% faster than DDP on TinyStories with GPT-Neo-8M.

URL: https://openreview.net/forum?id=8VTrvS5vN7

---

Title: A noise-corrected Langevin algorithm and sampling by half-denoising

Abstract: The Langevin algorithm is a classic method for sampling from a given pdf in a real space. In its basic version, it only requires knowledge of the gradient of the log-density, also called the score function. However, in deep learning, it is often easier to learn the so-called "noisy-data score function", i.e. the gradient of the log-density of noisy data, more precisely when Gaussian noise is added to the data. Such an estimate is biased and complicates the use of the Langevin method. Here, we propose a noise-corrected version of the Langevin algorithm, where the bias due to noisy data is removed, at least regarding first-order terms. Unlike diffusion models, our algorithm needs to know the noisy-data score function for one single noise level only. We further propose a simple special case which has an interesting intuitive interpretation of iteratively adding noise the data and then attempting to remove half of that noise.

URL: https://openreview.net/forum?id=QGtXn5GtfK

---

Title: Soft Prompt-tuning for Short Text Classification via Internal Knowledge Expansion

Abstract: The last decades have witnessed a vast amount of interest and research on short texts. The limited contextual information, feature sparsity, and semantic ambiguity accentuate the main challenges of short text classification. Recently, pre-trained language models (PLMs) have achieved tremendous success in various downstream Natural Language Processing (NLP) tasks including short text classification. However, most of the existing methods rely on the external expansion from the open knowledge base to address the inherent limitations of short texts, which not only inevitably incur a time-consuming query process and the omissions and biases in noise, but also rely on the high-quality open knowledge base and cannot be applied in some real-world off-line scenarios. In this paper, we propose a novel Soft Prompt-tuning method for short text classification via Internal Knowledge Expansion (SPIE). Our method stems from the recent success of prompt-tuning and extracts knowledge from the training dataset itself. We conduct hierarchically cluster and optimization strategies to fine-tune the obtained expansion words for the verbalizer in prompt-tuning. Furthermore, we employ soft prompt-tuning to avoid bias introduced by hand-crafted templates and improve the overall performance of the model. Despite internal expanding knowledge, experimental results demonstrate that our method even outperforms the methods that introduced external knowledge with much less computational time on four well-known benchmarks.

URL: https://openreview.net/forum?id=nnlYWMcDGZ

---

Title: Reachability Weighted Offline Goal-conditioned Resampling

Abstract: Offline goal-conditioned reinforcement learning (RL) relies on fixed datasets where many potential goals share the same state and action spaces. However, these potential goals are not explicitly represented in the collected trajectories. To learn a generalizable goal-conditioned policy, it is common to sample goals and state–action pairs uniformly using dynamic programming methods such as Q-learning. Uniform sampling, however, requires an intractably large dataset to cover all possible combinations and creates many unreachable state–goal–action pairs that degrade policy performance. Our key insight is that sampling should favor transitions that enable goal achievement. To this end, we propose Reachability Weighted Sampling (RWS). RWS uses a reachability classifier trained via positive–unlabeled (PU) learning on goal-conditioned state–action values. The classifier maps these values to a reachability score, which is then used as a sampling priority. RWS is a plug-and-play module that integrates seamlessly with standard offline RL algorithms. Experiments on six complex simulated robotic manipulation tasks, including those with a robot arm and a dexterous hand, show that RWS significantly improves performance. In one notable case, performance on the HandBlock-Z task improved by nearly 50% relative to the baseline. These results indicate the effectiveness of reachability-weighted sampling.

URL: https://openreview.net/forum?id=xKROcfbXFW

---

Title: Multiple Noises in Diffusion Model for Semi-Supervised Multi-Domain Translation

Abstract: In this work, we address the challenge of multi-domain translation, where the objective is to learn mappings between arbitrary configurations of domains within a defined set (such as $(D_1, D_2)\rightarrow{}D_3$, $D_2\rightarrow{}(D_1, D_3)$, $D_3\rightarrow{}D_1$, etc. for three domains) without the need for separate models for each specific translation configuration, enabling more efficient and flexible domain translation.
We introduce Multi-Domain Diffusion (MDD), a method with dual purposes: i) reconstructing any missing views for new data objects, and
ii) enabling learning in semi-supervised contexts with arbitrary supervision configurations. MDD achieves these objectives by exploiting the noise formulation of diffusion models, specifically modeling one noise level per domain.
Similar to existing domain translation approaches, MDD learns the translation between any combination of domains. However, unlike prior work, our formulation inherently handles semi-supervised learning without modification by representing missing views as noise in the diffusion process.
We evaluate our approach through domain translation experiments on BL3NDT, a multi-domain synthetic dataset designed for challenging semantic domain inversion, the BraTS2020 dataset, and the CelebAMask-HQ dataset.

URL: https://openreview.net/forum?id=vYdT26kDYM

---

Title: Min-Max Optimisation for Nonconvex-Nonconcave Functions Using a Random Zeroth-Order Extragradient Algorithm

Abstract: This study explores the performance of the random Gaussian smoothing Zeroth-Order ExtraGradient (ZO-EG) scheme considering min-max optimisation problems with possibly NonConvex-NonConcave (NC-NC) objective functions. We consider both unconstrained and constrained, differentiable and non-differentiable settings. We discuss the min-max problem from the point of view of variational inequalities. For the unconstrained problem, we establish the convergence of the ZO-EG algorithm to the neighbourhood of an $\epsilon$-stationary point of the NC-NC objective function, whose radius can be controlled under a variance reduction scheme, along with its complexity. For the constrained problem, we introduce the new notion of proximal variational inequalities and give examples of functions satisfying this property. Moreover, we prove analogous results to the unconstrained case for the constrained problem. For the non-differentiable case, we prove the convergence of the ZO-EG algorithm to a neighbourhood of an $\epsilon$-stationary point of the smoothed version of the objective function, where the radius of the neighbourhood can be controlled, which can be related to the ($\delta,\epsilon$)-Goldstein stationary point of the original objective function.

URL: https://openreview.net/forum?id=1bxY1uAXyr

---

Title: Loss Landscape Degeneracy Drives Stagewise Development in Transformers

Abstract: Deep learning involves navigating a high-dimen\-sional loss landscape over the neural network parameter space. Over the course of training, complex computational structures form and re-form inside the neural network, leading to shifts in input/output behavior. It is a priority for the science of deep learning to uncover principles governing the development of neural network structure and behavior. Drawing on the framework of singular learning theory, we propose that model development is deeply linked to degeneracy in the local geometry of the loss landscape. We investigate this link by monitoring loss landscape degeneracy throughout training, as quantified by the local learning coefficient, for a transformer language model and an in-context linear regression transformer. We show that training can be divided into distinct periods of change in loss landscape degeneracy, and that these changes in degeneracy coincide with significant changes in the internal computational structure and the input/output behavior of the transformers. This finding underscores the potential of a degeneracy-based perspective for understanding modern deep learning.

URL: https://openreview.net/forum?id=45qJyBG8Oj

---

Title: Learning from Heterophilic Graphs: A Spectral Theory Perspective on the Impact of Self-Loops and Parallel Edges

Abstract: Graph heterophily poses a formidable challenge to the performance of Message-passing Graph Neural Networks (MP-GNNs). The familiar low-pass filters like Graph Convolutional Networks (GCNs) face performance degradation, which can be attributed to the blending of the messages from dissimilar neighboring nodes. The performance of the low-pass filters on heterophilic graphs still requires an in-depth analysis. In this context, we update the heterophilic graphs by adding a number of self-loops and parallel edges. We observe that eigenvalues of the graph Laplacian decrease and increase respectively by increasing the number of self-loops and parallel edges. We conduct several studies regarding the performance of GCN on various benchmark heterophilic networks by adding either self-loops or parallel edges. The studies reveal that the GCN exhibited either increasing or decreasing performance trends on adding self-loops and parallel edges. In light of the studies, we established connections between the graph spectra and the performance trends of the low-pass filters on the heterophilic graphs. The graph spectra characterize the essential intrinsic properties of the input graph like the presence of connected components, sparsity, average degree, cluster structures, etc. Our work is adept at seamlessly evaluating graph spectrum and properties by observing the performance trends of the low-pass filters without pursuing the costly eigenvalue decomposition. The theoretical foundations are also discussed to validate the impact of adding self-loops and parallel edges on the graph spectrum.

URL: https://openreview.net/forum?id=f2xeXT2X5h

---

Title: Graph Personalized Federated Learning via Client Network Learning

Abstract: Graph classification is a widely studied problem for applications such as molecule/protein function prediction and drug discovery. Powerful graph neural networks (GNNs) have demonstrated state-of-the-art performance for the classification of complex graphs, but training such models can require significant amounts of high-quality labeled graphs that are expensive to collect. When individual institutes do not possess sufficient graph data, federated learning (FL) becomes a handy solution for them to collaboratively obtain powerful graph models without directly sharing their own graph data. However, existing FL frameworks for graph data do not consider the realistic setting of personalized FL with heterogeneous data, where each client aims to leverage the data of certain other clients to boost its own model performance. In this work, inspired by graph structure learning, we propose to learn a dynamic client network that tracks the graph data similarity across clients to guide model sharing along FL. Specifically, we rely on the marginal parameters of local GNNs to dynamically learn the client network, and refer to a set of fundamental graph properties to guide its learning. Extensive experiments on three real-world graph datasets demonstrate the consistent effectiveness of our two major proposed modules, which also mutually verify the effectiveness of each other.

URL: https://openreview.net/forum?id=pyTTR4pxkU

---

Title: SortBench: Benchmarking LLMs based on their ability to sort lists

Abstract: Sorting is a tedious but simple task for human intelligence and can be solved fairly easily algorithmically. However, for Large Language Models (LLMs) this task is surprisingly hard, as some properties of sorting are among known weaknesses of LLMs: being faithful to the input data, logical comparisons between values, and strictly differentiating between syntax (used for sorting) and semantics (typically learned by embeddings). Within this paper, we describe the new SortBench benchmark for LLMs that comes with different difficulties and that can be easily scaled in terms of difficulty. We apply this benchmark to seven state-of-the-art LLMs, including current test-time reasoning models. Our results show that while the o3-mini model is very capable at sorting in general, even this can be fooled if strings are defined to mix syntactical and semantical aspects, e.g., by asking to sort numbers written-out as word. Furthermore, all models have problems with the faithfulness to the input of long lists, i.e., they drop items and add new ones. Our results also show that test-time reasoning has a tendency to overthink problems which leads to performance degradation. Finally, models without test-time reasoning like GPT-4o are not much worse than reasoning models.

URL: https://openreview.net/forum?id=W2rrgTIZN4

---

Title: Efficient Hardware Scaling and Diminishing Returns in Large-Scale Training of Language Models

Abstract: To train the exceedingly large neural networks required in modern applications, such as large language models (LLMs), model training is distributed across tens of thousands of hardware accelerators (e.g. GPUs), requiring orchestration of computation and communication across large computing clusters. In this work, we demonstrate that careful consideration of hardware configuration and parallelization strategy is critical for effective (i.e. compute- and cost-efficient) scaling of model training. We conduct an extensive empirical study of the performance of large-scale LLM training workloads across model size, hardware configurations, and distributed parallelization strategies with current best practices. In experiments with model sizes up to 70B parameters and utilizing up to 2048 H100 GPUs, we demonstrate that: (1) Naive scale out with Fully Sharded Data Parallelism (FSDP) incurs communication overhead which leads parallelization strategies previously thought to be sub-optimal in fact become preferable; and (2) scaling the total number of accelerators for training quickly
yields diminishing returns even when hardware and parallelization strategies are properly optimized, implying poor marginal performance per additional unit of power or GPU-hour.

URL: https://openreview.net/forum?id=p7jQEf3wlh

---

Title: If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Abstract: The prominent large language models (LLMs) of today differ from past language models not only in size, but also in the fact that they are trained on a combination of natural language and formal language (code). As a medium between humans and computers, code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity. In this survey, we present an overview of the various benefits of integrating code into LLMs' training data. Specifically, beyond enhancing LLMs in code generation, we observe that these unique properties of code help (i) unlock the reasoning ability of LLMs, enabling their applications to a range of more complex natural language tasks; (ii) steer LLMs to produce structured and precise intermediate steps, which can then be connected to external execution ends through function calls; and (iii) take advantage of code compilation and execution environment, which also provides diverse feedback for model improvement. In addition, we trace how these profound capabilities of LLMs, brought by code, have led to their emergence as intelligent agents (IAs) in situations where the ability to understand instructions, decompose goals, plan and execute actions, and refine from feedback are crucial to their success on downstream tasks. Finally, we present several key challenges and future directions of empowering LLMs with code.

URL: https://openreview.net/forum?id=kAsbbCvQdv

---

Title: Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler

Abstract: Learning rate scheduling is essential in transformer training, where the final annealing plays a crucial role in getting the best performance. However, the mechanisms behind this cooldown phase, with its characteristic drop in loss, remain poorly understood. To address this, we provide a comprehensive analysis focusing solely on the cooldown phase in the Warmup-Stable-Decay (WSD) learning rate scheduler. Our analysis reveals that different cooldown shapes reveal a fundamental bias-variance trade-off in the resulting models, with shapes that balance exploration and exploitation consistently outperforming alternatives. Similarly, we find substantial performance variations — comparable to those from cooldown shape selection — when tuning AdamW hyperparameters. Notably, we observe consistent improvements with higher values of $\beta_2$ during cooldown. From a loss landscape perspective, we provide visualizations of the landscape during cooldown, supporting the river valley loss perspective empirically. These findings offer practical recommendations for configuring the WSD scheduler in transformer training, emphasizing the importance of optimizing the cooldown phase alongside traditional hyperparameter tuning.

URL: https://openreview.net/forum?id=ZnSYEcZod3

---

Title: Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors

Abstract: In this work, we introduce a generative approach for pose-free (without camera parameters) reconstruction of 360 scenes from a sparse set of 2D images. Pose-free scene reconstruction from incomplete, pose-free observations is usually regularized with depth estimation or 3D foundational priors. While recent advances have enabled sparse-view reconstruction of large complex scenes (with high degree of foreground and background detail) with known camera poses using view-conditioned generative priors, these methods cannot be directly adapted for the pose-free setting when ground-truth poses are not available during evaluation. To address this, we propose an image-to-image generative model designed to inpaint missing details and remove artifacts in novel view renders and depth maps of a 3D scene. We introduce context and geometry conditioning using Feature-wise Linear Modulation (FiLM) modulation layers as a lightweight alternative to cross-attention and also propose a novel confidence measure for 3D Gaussian splat representations to allow for better detection of these artifacts. By progressively integrating these novel views in a Gaussian-SLAM-inspired process, we achieve a multi-view-consistent 3D representation. Evaluations on the MipNeRF360 and DL3DV-10K benchmark dataset demonstrate that our method surpasses existing pose-free techniques and performs competitively with state-of-the-art posed (precomputed camera parameters are given) reconstruction methods in complex 360 scenes. Our code and datasets will be open-sourced upon acceptance.

URL: https://openreview.net/forum?id=yp1CYo6R0r

---

Title: Streaming Heteroscedastic Probabilistic PCA with Missing Data

Abstract: Streaming principal component analysis (PCA) is an integral tool in large-scale machine learning for rapidly estimating low-dimensional subspaces from very high-dimensional data arriving at a high rate. However, modern datasets increasingly combine data from a variety of sources, and thus may exhibit heterogeneous quality across samples. Standard streaming PCA algorithms do not account for non-uniform noise, so their subspace estimates can quickly degrade. While the recently proposed Heteroscedastic Probabilistic PCA Technique (HePPCAT) addresses this heterogeneity, it was not designed to handle streaming data, which may exhibit non-stationary behavior. Moreover, HePPCAT does not allow for missing entries in the data, which can be common in streaming data. This paper proposes the Streaming HeteroscedASTic Algorithm for PCA (SHASTA-PCA) to bridge this divide. SHASTA-PCA employs a stochastic alternating expectation maximization approach that jointly learns the low-rank latent factors and the unknown noise variances from streaming data that may have missing entries and heteroscedastic noise, all while maintaining a low memory and computational footprint. Numerical experiments demonstrate the superior subspace estimation of our method compared to state-of-the-art streaming PCA algorithms in the heteroscedastic setting. Finally, we illustrate SHASTA-PCA applied to highly heterogeneous real data from astronomy.

URL: https://openreview.net/forum?id=lb2rPLuP9X

---

Title: AIDA: Action Inquiry DAgger for Interactive Imitation Learning

Abstract: Human teaching effort is a significant bottleneck for the broader applicability of interactive imitation learning. To reduce the number of required queries, existing methods employ active learning to query the human teacher only in uncertain, risky, or novel situations. However, during these queries, the novice's planned actions are not utilized despite containing valuable information, such as the novice's capabilities, as well as corresponding uncertainty levels. To this end, we allow the novice to say: "I plan to do this, but I am uncertain." We introduce the Action Inquiry DAgger (AIDA) framework, which leverages teacher feedback on the novice plan in three key ways: (1) Sensitivity-Aware Gating (SAG), which adjusts the query threshold to track a desired sensitivity level; (2) Foresight Interactive Experience Replay (FIER), which recasts valid and relabeled novice action plans into demonstrations; and (3) Prioritized Interactive Experience Replay (PIER), which prioritizes replay based on uncertainty, novice success, and demonstration age. Together, these components balance query frequency with failure incidence, reduce the number of required demonstration annotations, improve generalization, and speed up adaptation to changing domains. We validate the effectiveness of AIDA through language-conditioned manipulation tasks in both simulation and real-world environments. Code, data, and videos are available at https://aida-paper.github.io.

URL: https://openreview.net/forum?id=987Az9f8fT

---

Title: Low-rank Momentum Factorization for Memory Efficient Training

Abstract: Fine-tuning large foundation models presents significant memory challenges due to stateful optimizers like AdamW, often requiring several times more GPU memory than inference. While memory-efficient methods like parameter-efficient fine-tuning (e.g., LoRA) and optimizer state compression exist, recent approaches like GaLore bridge these by using low-rank gradient projections and subspace moment accumulation. However, such methods may struggle with fixed subspaces or computationally costly offline resampling (e.g., requiring full-matrix SVDs). In this work, we propose MoFaSGD, which maintains a dynamically updated low-rank SVD representation of the first-order momentum, closely approximating its full-rank counterpart throughout training. This factorization enables a memory-efficient fine-tuning method that adaptively updates the optimization subspace at each iteration. Crucially, MoFaSGD leverages the computed low-rank momentum factors to perform efficient spectrally normalized updates, offering an alternative to subspace moment accumulation without additional computational overhead. Theoretically, we establish convergence guarantees for MoFaSGD, demonstrating an optimal rate for non-convex stochastic optimization under standard assumptions. Empirically, we demonstrate MoFaSGD's effectiveness on large language model alignment benchmarks, achieving a competitive trade-off between memory reduction (comparable to LoRA) and performance compared to state-of-the-art low-rank optimization methods. Our implementation is available at \url{https://github.com/AnonCode1/MFSGD.git}.

URL: https://openreview.net/forum?id=W3D3TVo9a3

---

Title: Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

Abstract: Detecting whether an LLM hallucinates is an important research challenge. One promising way of doing so is to estimate the semantic entropy (Farquhar et al., 2024) of the distribution of generated sequences. We propose a new algorithm for doing that, with two main advantages. First, due to us taking the Bayesian approach, we achieve a much better quality of semantic entropy estimates for a given budget of samples from the LLM. Second, we are able to tune the number of samples adaptively so that `harder' contexts receive more samples. We demonstrate empirically that our approach systematically beats the baselines, requiring only $59\%$ of samples used by Farquhar et al. (2024) to achieve the same quality of hallucination detection as measured by AUROC. Moreover, quite counterintuitively, our estimator is useful even with just one sample from the LLM.

URL: https://openreview.net/forum?id=j2N2RuNdbC

---

Title: Walking on the Fiber: A Simple Geometric Approximation for Bayesian Neural Networks

Abstract: Bayesian Neural Networks provide a principled framework for uncertainty quantification by modeling the posterior distribution of network parameters. However, exact posterior inference is computationally intractable, and widely used approximations like the Laplace method struggle with scalability and posterior accuracy in modern deep networks. In this work, we revisit sampling techniques for posterior exploration, proposing a simple variation tailored to efficiently sample from the posterior in over-parameterized networks by leveraging the low-dimensional structure of loss minima. Building on this, we introduce a model that learns a deformation of the parameter space, enabling rapid posterior sampling without requiring iterative methods. Empirical results demonstrate that our approach achieves competitive posterior approximations with improved scalability compared to recent refinement techniques. These contributions provide a practical alternative for Bayesian inference in deep learning.

URL: https://openreview.net/forum?id=NsuPykrjOd

---

Title: Customizing Spider Silk: Generative Models with Mechanical Property Conditioning for Protein Engineering

Abstract: The remarkable mechanical properties of spider silk, including its tensile strength and ex-
tensibility, are primarily governed by the repetitive regions of the proteins that constitute
the fiber, the major ampullate spidroins (MaSps). However, establishing correlations be-
tween mechanical characteristics and repeat sequences is challenging due to the intricate
sequence-structure-function relationships of MaSps and the limited availability of annotated
datasets. In this study, we present a novel computational framework for designing MaSp
repeat sequences with customizable mechanical properties. To achieve this, we developed
a lightweight GPT-based generative model by distilling the pre-trained ProtGPT2 protein
language model. The distilled model was subjected to multilevel fine-tuning using curated
subsets of the Spider Silkome dataset. Specifically, we adapt the model for MaSp repeat
generation using 6,000 MaSp repeat sequences and further refine it with 572 repeats associ-
ated with experimentally determined fiber-level mechanical properties. Our model generates
biologically plausible MaSp repeat regions tailored to specific mechanical properties while
also predicting those properties for given sequences. Validation includes sequence-level anal-
ysis, assessing physicochemical attributes and expected distribution of key motifs as well as
secondary structure compositions. A correlation study using BLAST on the Spider Silkome
dataset and a test set of MaSp repeats with known mechanical properties further confirmed
the predictive accuracy of the model. This framework advances the rational design of spider
silk-inspired biomaterials, offering a versatile tool for engineering protein sequences with
tailored mechanical attributes.

URL: https://openreview.net/forum?id=37YSapXDK6

---

Title: Disentangled and Self-Explainable Node Representation Learning

Abstract: Node embeddings are low-dimensional vectors that capture node properties, typically learned through unsupervised structural similarity objectives or supervised tasks. While recent efforts have focused on post-hoc explanations for graph models, intrinsic interpretability in unsupervised node embeddings remains largely underexplored. To bridge this gap, we introduce DiSeNE (Disentangled and Self-Explainable Node Embedding), a framework that learns self-explainable node representations in an unsupervised fashion. By leveraging disentangled representation learning, DiSeNE ensures that each embedding dimension corresponds to a distinct topological substructure of the graph, thus offering clear, dimension-wise interpretability. We introduce new objective functions grounded in principled desiderata, jointly optimizing for structural fidelity, disentanglement, and human interpretability. Additionally, we propose several new metrics to evaluate representation quality and human interpretability. Extensive experiments on multiple benchmark datasets demonstrate that DiSeNE not only preserves the underlying graph structure but also provides transparent, human-understandable explanations for each embedding dimension.

URL: https://openreview.net/forum?id=s51TQ8Eg1e

---

Title: EEG-EyeTrack: A Benchmark for Time Series and Functional Data Analysis with Open Challenges and Baselines

Abstract: A new benchmark dataset for functional data analysis (FDA) is presented, focusing on the reconstruction of eye movements from EEG data. The contribution is twofold: first, open challenges and evaluation metrics tailored to FDA applications are proposed. Second, functional neural networks are used to establish baseline results for the primary regression task of reconstructing eye movements from EEG signals. Baseline results are reported for the new dataset, based on consumer-grade hardware, and the EEGEyeNet dataset, based on research-grade hardware.

URL: https://openreview.net/forum?id=bHeT8VMTgW

---

Title: Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge

Abstract: Schödinger bridge (SB) has evolved into a universal class of probabilistic generative models. In practice, however, estimated learning signals are often uncertain, and the reliability promised by existing methods is often based on speculative optimal-case scenarios. Recent studies regarding the Sinkhorn algorithm through mirror descent (MD) have gained attention, revealing geometric insights into solution acquisition of the SB problems. In this paper, we propose a variational online MD (OMD) framework for the SB problems, which provides further stability to SB solvers. We formally prove convergence and a regret bound for the novel OMD formulation of SB acquisition. As a result, we propose a simulation-free SB algorithm called Variational Mirrored Schrödinger Bridge (VMSB) by utilizing the Wasserstein-Fisher-Rao geometry of the Gaussian mixture parameterization for Schrödinger potentials. Based on the Wasserstein gradient flow theory, the algorithm offers tractable learning dynamics that precisely approximate each OMD step. In experiments, we validate the performance of the proposed VMSB algorithm across an extensive suite of benchmarks. VMSB consistently outperforms contemporary SB solvers on a range of SB problems, demonstrating the robustness predicted by our theory.

URL: https://openreview.net/forum?id=xGsg2L3ppf

---

Title: Label Smoothing is a Pragmatic Information Bottleneck

Abstract: This study revisits label smoothing via the information bottleneck. Under the assumption of sufficient model flexibility and no conflicting labels for the same input, we theoretically and experimentally demonstrate that the model output obtained through label smoothing explores the optimal solution of the information bottleneck. Based on this, label smoothing can be interpreted as a practical approach to the information bottleneck, enabling simple implementation. As an information bottleneck method, we experimentally show that label smoothing also exhibits the property of being insensitive to factors that do not contain information about the target, or to factors that provide no additional information about it when conditioned on another variable.

URL: https://openreview.net/forum?id=Q0QEDhpbAK

---

Title: Learned Systems Security

Abstract: A learned system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users or processes, much like microarchitectural resources such as caches, potentially giving rise to highly-realistic attacker models. However, compared to attacks on other ML-based systems, attackers face a level of indirection as they cannot interact directly with the learned model. Additionally, the difference between the attack surface of learned and non-learned versions of the same system is often subtle. These factors obfuscate the de-facto risks that the incorporation of ML carries. We analyze the root causes of potentially-increased attack surface in learned systems and develop a framework for identifying vulnerabilities that stem from the use of ML. We apply our framework to a broad set of learned systems under active development. To empirically validate the many vulnerabilities surfaced by our framework, we choose 3 of them and implement and evaluate exploits against prominent instances of learned systems. We show that the use of ML cause leakage of past queries in a database, enable a poisoning attack that causes exponential memory blowup in an index structure and crashes it in seconds, and enable index users to snoop on each others' key distributions by timing queries over their own keys. We find that adversarial ML is an universal threat against learned systems, point to open research gaps in our understanding of learned-systems security, and conclude by discussing mitigations, while noting that data leakage is inherent in systems whose learned component is shared between multiple parties.

URL: https://openreview.net/forum?id=XNVBSbtcKB

---

Title: Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Abstract: Despite extensive diagnostics and debugging by developers, AI systems sometimes exhibit harmful unintended behaviors. Finding and fixing these is challenging because the attack surface is so large -- it is not tractable to exhaustively search for inputs that may elicit harmful behaviors. Red-teaming and adversarial training (AT) are commonly used to improve robustness, however, they empirically struggle to fix failure modes that differ from the attacks used during training. In this work, we utilize latent adversarial training (LAT) to defend against vulnerabilities without leveraging knowledge of what they are or using inputs that elicit them. LAT makes use of the compressed, abstract, and structured latent representations of concepts that the network actually uses for prediction. Here, we use it to defend against failure modes without examples that elicit them. Specifically, we use LAT to remove trojans and defend against held-out classes of adversarial attacks. We show in image classification, text classification, and text generation tasks that LAT usually improves both robustness to novel attacks and performance on clean data relative to AT. This suggests that LAT can be a promising tool for defending against failure modes that are not explicitly identified by developers.

URL: https://openreview.net/forum?id=mVPPhQ8cAd

---

Title: Disobeying Directions: Switching Random Walk Filters for Unsupervised Node Embedding Learning on Directed Graphs

Abstract: Unsupervised learning of node embeddings for directed graphs (digraphs) requires careful handling to ensure unbiased modelling. This paper addresses two key challenges: (1) the obstruction of information propagation in random walk and message-passing methods due to local sinks, and (2) the representation of multiple multi-step directed neighbourhoods, arising from the distinction between in- and out-neighbours. These challenges are interconnected—local sinks can be mitigated by treating the graph as undirected, but this comes at the cost of discarding all directional information. We make two main contributions to unsupervised embedding learning for digraphs. First, we introduce ReachNEs (Reachability Node Embeddings), a general framework for analysing embedding models and diagnosing local sink behaviour on digraphs. ReachNEs defines the reachability filter, a matrix polynomial over normalized adjacency matrices that captures multi-step, direction-sensitive proximity. It unifies the analysis of message-passing and random walk models, making its insights applicable across a wide range of embedding methods. Second, we propose DirSwitch, a novel embedding model that resolves both local sink bias and neighbourhood multiplicity via switching random walks. These walks use directed edges for local steps, preserving directional structure, then switch to undirected edges for long-range transitions, enabling escape from local sinks and improving information dispersal. Empirical results on node classification benchmarks demonstrate that DirSwitch consistently outperforms state-of-the-art unsupervised digraph proximity embedding methods, and also serves as a flexible digraph extension for self-supervised graph neural networks. Our source code is publicly available https://anonymous.4open.science/r/dirswitch-experiments-tmlr2025-C5F2.

URL: https://openreview.net/forum?id=yngjRgVA5A

---

Title: Can you trust your experiments? Generalizability of Experimental Studies

Abstract: Experimental studies are a cornerstone of Machine Learning (ML) research. A common and often implicit assumption is that the study’s results will generalize beyond the study itself, e.g., to new data. That is, repeating the same study under different conditions will likely yield similar results. Existing frameworks to measure generalizability, borrowed from the casual inference literature, cannot capture the complexity of the results and the goals of an ML study. The problem of measuring generalizability in the more general ML setting is thus still open, also due to the lack of a mathematical formalization of experimental studies. In this paper, we propose such a formalization, use it to develop a framework to quantify generalizability, and propose an instantiation based on rankings and the Maximum Mean Discrepancy. We show how this latter offers insights into the desirable number of experiments for a study. Finally, we investigate the generalizability of two recently published experimental studies.

URL: https://openreview.net/forum?id=j1ZtWdWn7u

---

Title: MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets

Abstract: Generative Flow Networks (GFlowNets) are a class of generative models that sample objects in proportion to a specified reward function through a learned policy. They can be trained either on-policy or off-policy, needing a balance between exploration and exploitation for fast convergence to a target distribution. While exploration strategies for discrete GFlowNets have been studied, exploration in the continuous case remains to be investigated, despite the potential for novel exploration algorithms due to the local connectedness of continuous domains. Here, we introduce Adapted Metadynamics, a variant of metadynamics that can be applied to arbitrary black-box reward functions on continuous domains. We use Adapted Metadynamics as an exploration strategy for continuous GFlowNets. We show several continuous domains where the resulting algorithm, MetaGFN, accelerates convergence to the target distribution and discovers more distant reward modes than previous off-policy exploration strategies used for GFlowNets.

URL: https://openreview.net/forum?id=dtyNeemB7A

---

Title: Federated Link Prediction on Dynamic Graphs

Abstract: Link prediction on dynamic, large-scale graphs has been widely used in real-world applications, such as forecasting customer visits to restaurants or predicting product purchases. However, graph data is often localized due to privacy and efficiency concerns. Training separate local models based on data in each region preserves privacy but often leads to less accurate models, especially in smaller regions with fewer users and products. Federated learning then collaboratively trains models on localized data to maintain model accuracy and data privacy. However, the vanilla FL approach requires training the entire historical graph of user interactions, introducing high computational costs during training. While training on the most recent data may help reduce overhead, it decreases the model accuracy and incurs data imbalance across clients. For instance, regions with more users will contribute more training data, potentially biasing the model toward those users. We introduce FedLink, a federated graph training framework for solving link prediction tasks on dynamic graphs. By continuously training on fixed-size buffers of client data, we can significantly reduce the computation overhead compared to training on the entire historical graph, while still training a global model across regions. Experiments demonstrate that FedLink matches the accuracy of training a centralized model while requiring 3.41$\times$ less memory and running 28.9% faster compared with full-batch federated graph training.

URL: https://openreview.net/forum?id=V3zqaDmOLF

---

Title: Bayesian Neighborhood Adaptation for Graph Neural Networks

Abstract: The neighborhood scope (i.e., number of hops) where graph neural networks (GNNs) aggregate information to characterize a node's statistical property is critical to GNNs' performance. Two-stage approaches, training and validating GNNs for every pre-specified neighborhood scope to search for the best setting, is a daunting and time-consuming task and tends to be biased due to the search space design. How to adaptively determine proper neighborhood scopes for the aggregation process for both homophilic and heterophilic graphs remains largely unexplored. We thus propose to model the GNNs' message-passing behavior on a graph as a stochastic process by treating the number of hops as a beta process. This Bayesian framework allows us to infer the most plausible neighborhood scope for message aggregation simultaneously with the optimization of GNN parameters. Our theoretical analysis shows that the scope inference improves the expressivity of a GNN. Experiments on benchmark homophilic and heterophilic datasets show that the proposed method is compatible with state-of-the-art GNN variants, achieving competitive or superior performance, and providing well-calibrated predictions.

URL: https://openreview.net/forum?id=2zEemRib3a

---

Reply all
Reply to author
Forward
0 new messages