Weekly TMLR digest for Jan 11, 2026

0 views

Skip to first unread message

TMLR

unread,

Jan 11, 2026, 12:00:10 AMJan 11

to tmlr-annou...@googlegroups.com

New certifications
==================

Reproducibility Certification, J2C Certification: BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Tao Yu, Zhengbo Zhang, Zhiheng Lyu, Junhao Gong, Hongzhu Yi, Xinming Wang, Yuxuan Zhou, Jiabing Yang, Ping Nie, Yan Huang, Wenhu Chen

https://openreview.net/forum?id=X4CfZPSEHE

---

Survey Certification: Fast weight programming and linear transformers: from machine learning to neurobiology

Kazuki Irie, Samuel J. Gershman

https://openreview.net/forum?id=TDG8EkNmQR

---

J2C Certification: ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zheng Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh, Qian Lou, Jie Xu, Yen-Chang Hsu

https://openreview.net/forum?id=RFHq46pjb6

---

J2C Certification: StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Jialin Yang, Dongfu Jiang, Tony He, Sherman Siu, Yuxuan Zhang, Disen Liao, Zhuofeng Li, Huaye Zeng, Yiming Jia, Haozhe Wang, Benjamin Schneider, Chi Ruan, Wentao Ma, Zhiheng Lyu, Yifei Wang, Yi Lu, Quy Duc Do, Ziyan Jiang, Ping Nie, Wenhu Chen

https://openreview.net/forum?id=buDwV7LUA7

---

J2C Certification: DuFal: Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-view CBCT Reconstruction

Cuong Tran Van, Trong-Thang Pham, Ngoc-Son Nguyen, Duy Minh Ho Nguyen, Ngan Le

https://openreview.net/forum?id=2wAZjAtK16

---

Survey Certification: Offline Model-Based Optimization: Comprehensive Review

Minsu Kim, Jiayao Gu, Ye Yuan, Taeyoung Yun, Zixuan Liu, Yoshua Bengio, Can Chen

https://openreview.net/forum?id=QcSZWo1TLl

---

J2C Certification: Synergistic Benefits of Joint Molecule Generation and Property Prediction

Adam Izdebski, Jan Olszewski, Pankhil Gawade, Krzysztof Koras, Serra Korkmaz, Valentin Rauscher, Jakub M. Tomczak, Ewa Szczurek

https://openreview.net/forum?id=jnzCOLyGOA

---

J2C Certification: Denoising Diffusions with Optimal Transport: Localization, Curvature, and Multi-Scale Complexity

Tengyuan Liang, Kulunu Dharmakeerthi, Takuya Koriyama

https://openreview.net/forum?id=sj1wU6gBXH

---

Accepted papers
===============

Title: TRecViT: A Recurrent Video Transformer

Authors: Viorica Patraucean, Xu Owen He, Joseph Heyward, Chuhan Zhang, Mehdi S. M. Sajjadi, George-Cristian Muraru, Artem Zholus, Mahdi Karami, Ross Goroshin, Yutian Chen, Simon Osindero, Joao Carreira, Razvan Pascanu

Abstract: We propose a novel block for causal video modelling. It relies on a time - space - channel factorisation with dedicated blocks for each dimension: gated linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform mixing over space, and MLPs over channels. The resulting architecture TRecViT is causal and shows strong performance on sparse and dense tasks, trained in supervised or self-supervised regimes, being the first causal video model in the state-space models family. Notably, our model outperforms or is on par with the popular (non-causal) ViViT-L model on large scale video datasets (SSv2, Kinetics400), while having 3x less parameters, 12x smaller memory footprint, and 5x lower FLOPs count than the full self-attention ViViT, with an inference throughput of about 300 frames per second, running comfortably in real-time. When compared with causal transformer-based models (TSM, RViT) and other recurrent models like LSTM, TRecViT obtains state-of-the-art results on the challenging SSv2 dataset.
Code and checkpoints are available online \url{https://github.com/google-deepmind/trecvit}.

URL: https://openreview.net/forum?id=Mmi46Ytb1H

---

Title: Subspace based Federated Unlearning

Authors: Guanghao Li, Li Shen, Yan Sun, Yue Hu, Han Hu, Dacheng Tao

Abstract: Federated learning (FL) enables collaborative machine learning among multiple clients while preserving user data privacy by preventing the exchange of local data. However, when users request to leave the FL system, the trained FL model may still retain information about their contributions. To comply with the right to be forgotten, federated unlearning has been proposed, which aims to remove a designated client's influence from the FL model. Existing federated unlearning methods typically rely on storing historical parameter updates, which may be impractical in resource-constrained FL settings. In this paper, we propose a Subspace-based Federated Unlearning method (SFU) that addresses this challenge without requiring additional storage. SFU updates the model via gradient ascent constrained within a subspace, specifically the orthogonal complement of the gradient descent directions derived from the remaining clients. By projecting the ascending gradient of the target client onto this subspace, SFU can mitigate the contribution of the target client while maintaining model performance on the remaining clients. SFU is communication-efficient, requiring only one round of local training per client to transmit gradient information to the server for model updates. Extensive empirical evaluations on multiple datasets demonstrate that SFU achieves competitive unlearning performance while preserving model utility. Compared to representative baseline methods, SFU consistently shows promising results under various experimental settings.

URL: https://openreview.net/forum?id=KE2ZNl2lFP

---

Title: Uncovering the Computational Roles of Nonlinearity in Sequence Modeling Using Almost-Linear RNNs

Authors: Manuel Brenner, Georgia Koppe

Abstract: Sequence modeling tasks across domains such as natural language processing, time-series forecasting, speech recognition, and control require learning complex mappings from input to output sequences. In recurrent networks, nonlinear recurrence is theoretically required to universally approximate such sequence-to-sequence functions; yet in practice, linear recurrent models have often proven surprisingly effective. This raises the question of when nonlinearity is truly required. In this study, we present a framework to systematically dissect the functional role of nonlinearity in recurrent networks -- allowing to identify both when it is computationally necessary, and what mechanisms it enables. We address the question using Almost Linear Recurrent Neural Networks (AL-RNNs), which allow the recurrence nonlinearity to be gradually attenuated and decompose network dynamics into analyzable linear regimes, making the underlying computational mechanisms explicit. We illustrate the framework across a diverse set of synthetic and real-world tasks, including classic sequence modeling benchmarks, an empirical neuroscientific stimulus-selection task, and a multi-task suite. We demonstrate how the AL-RNN's piecewise linear structure enables direct identification of computational primitives such as gating, rule-based integration, and memory-dependent transients, revealing that these operations emerge within predominantly linear dynamical backbones. Across tasks, sparse nonlinearity plays several functional roles: it improves interpretability by reducing and localizing nonlinear computations, promotes shared (rather than highly distributed) representations in multi-task settings, and reduces computational cost by limiting nonlinear operations. Moreover, sparse nonlinearity acts as a useful inductive bias: in low-data regimes, or when tasks require discrete switching between linear regimes, sparsely nonlinear models often match or exceed the performance of fully nonlinear architectures. Our findings provide a principled approach for identifying where nonlinearity is functionally necessary in sequence models, guiding the design of recurrent architectures that balance performance, efficiency, and mechanistic interpretability.

URL: https://openreview.net/forum?id=qI2Vt9P9rl

---

Title: Context-aware Learned Mesh-based Simulation via Trajectory-Level Meta-Learning

Authors: Philipp Dahlinger, Niklas Freymuth, Tai Hoang, Tobias Würth, Michael Volpp, Luise Kärger, Gerhard Neumann

Abstract: Simulating object deformations is a critical challenge across many scientific domains, including robotics, manufacturing, and structural mechanics.
Learned Graph Network Simulators (GNSs) offer a promising alternative to traditional mesh-based physics simulators.
Their speed and inherent differentiability make them particularly well suited for applications that require fast and accurate simulations, such as robotic manipulation or manufacturing optimization.
However, existing learned simulators typically rely on single-step observations, which limits their ability to exploit temporal context.
Without this information, these models fail to infer, e.g., material properties.
Further, they rely on auto-regressive rollouts, which quickly accumulate error for long trajectories.
We instead frame mesh-based simulation as a trajectory-level meta-learning problem.
Using Conditional Neural Processes, our method enables rapid adaptation to new simulation scenarios from limited initial data while capturing their latent simulation properties.
We utilize movement primitives to directly predict fast, stable and accurate simulations from a single model call.
The resulting approach, Movement-primitive Meta-MeshGraphNet (M3GN), provides higher simulation accuracy at a fraction of the runtime cost compared to state-of-the-art GNSs across several tasks.

URL: https://openreview.net/forum?id=j5uACS2Doh

---

Title: Formal Methods in Robot Policy Learning and Verification: A Survey on Current Techniques and Future Directions

Authors: Anastasios Manganaris, Vittorio Giammarino, Ahmed H Qureshi, Suresh Jagannathan

Abstract: As hardware and software systems have grown in complexity, formal methods have been indispensable tools for rigorously specifying acceptable behaviors, synthesizing programs to meet these specifications, and validating the correctness of existing programs. In the field of robotics, a similar trend of rising complexity has emerged, driven in large part by the adoption of deep learning. While this shift has enabled the development of highly performant robot policies, their implementation as deep neural networks has posed challenges to traditional formal analysis, leading to models that are inflexible, fragile, and difficult to interpret. In response, the robotics community has introduced new formal and semi-formal methods to support the precise specification of complex objectives, guide the learning process to achieve them,and enable the verification of learned policies against them. In this survey, we provide a comprehensive overview of how formal methods have been used in recent robot learning research. We organize our discussion around two pillars: policy learning and policy verification. For both, we highlight representative techniques, compare their scalability and expressiveness, and summarize how they contribute to meaningfully improving realistic robot safety and correctness. We conclude with a discussion of remaining obstacles for achieving that goal and promising directions for advancing formal methods in robot learning.

URL: https://openreview.net/forum?id=DZkikdg5sl

---

Title: Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment

Authors: Joanna Hong, Sanjeel Parekh, Honglie Chen, Jacob Donley, Ke Tan, Buye Xu, Anurag Kumar

Abstract: Building reliable speech systems often requires combining multiple modalities, like audio and visual cues. While such multimodal solutions frequently lead to improvements in performance and may even be critical in certain cases, they come with several constraints such as increased sensory requirements, computational cost, and modality synchronization, to mention a few. These challenges constrain the direct uses of these multimodal solutions in real-world applications. In this work, we develop approaches where the learning happens with all available modalities but the deployment or inference is done with just one or reduced modalities. To do so, we propose a Multimodal Training and Unimodal Deployment (MUTUD) framework which includes a Temporally Aligned Modality feature Estimation (TAME) module that can estimate information from missing modality using modalities present during inference. This innovative approach facilitates the integration of information across different modalities, enhancing the overall inference process by leveraging the strengths of each modality to compensate for the absence of certain modalities during inference. We apply MUTUD to various audiovisual speech tasks and show that it can reduce the performance gap between the multimodal and corresponding unimodal models to a considerable extent. MUTUD can achieve this while reducing the model size and compute compared to multimodal models, in some cases by almost 80%.

URL: https://openreview.net/forum?id=5bshBY8RDf

---

Title: CAPE: Generalized Convergence Prediction Across Architectures Without Full Training

Authors: Alireza Pourali, Arian Boukani, Hamzeh Khazaei

Abstract: Training deep neural networks to convergence is expensive and time-consuming, especially when exploring new architectures or hardware configurations. Prior work has primarily estimated per-iteration or per-epoch cost under fixed training schedules, overlooking the critical challenge of predicting how long a model will take to converge. We present \textit{CAPE} (Convergence-Aware Prediction Engine), a lightweight and probing-based framework that predicts the number of epochs required for convergence before any full training occurs. CAPE performs a brief probe at initialization using a small batch of data to extract analytical and dynamical features such as parameter count, dataset size, learning rate, batch size, gradient norm, Neural Tangent Kernel (NTK) trace, and initial loss. These features jointly characterize the model’s optimization landscape and serve as input to a meta-model trained to forecast convergence horizons under a validation-based early-stopping criterion. CAPE achieves strong predictive correspondence to true convergence epochs, with a Pearson correlation of 0.89 across diverse architectures and datasets, demonstrating accurate and consistent convergence prediction across model families. By enabling zero-shot prediction of full-dataset convergence behavior, CAPE provides a practical tool for rapid model selection, hyperparameter exploration, and resource-aware training planning.

URL: https://openreview.net/forum?id=wGngf0wBYn

---

Title: Enhancing Semi-supervised Learning with Zero-shot Pseudolabels

Authors: Jichan Chung, Irene Y. Chen

Abstract: The high cost of data labeling presents a major barrier to deploying machine learning systems at scale. Semi-supervised learning (SSL) mitigates this challenge by utilizing unlabeled data alongside limited labeled examples, while the emergence of foundation models (FMs) offers powerful zero-shot capabilities that can further reduce labeling cost. However, directly fine-tuning large FMs is often impractical in resource-constrained settings, and naïvely using their pseudo-labels for unlabeled data can degrade performance due to its unreliablity or domain mismatch with target task. In this work, we introduce ZeroMatch, a novel SSL framework that integrates knowledge distillation with consistency-based learning to jointly leverage labeled data, unlabeled data, and pseudo-labels from FMs. ZeroMatch trains a compact student model and access FMs only through inference services, making it suitable for low-resource environments such as personal devices with limited compute. Experiments on six vision and language classification benchmarks show that ZeroMatch consistently outperforms standard SSL and zero-shot augmented methods, demonstrating its effectiveness and robustness across a range of foundation model qualities.

URL: https://openreview.net/forum?id=WB05Doi29V

---

Title: GGFlow: A Graph Flow Matching Method with Efficient Optimal Transport

Authors: Xiaoyang Hou, Tian Zhu, Milong Ren, Dongbo Bu, Xin Gao, Chunming Zhang, Shiwei Sun

Abstract: Generating graph-structured data is crucial in various domains but remains challenging due to the complex interdependencies between nodes and edges. While diffusion models have demonstrated their superior generative capabilities, they often suffer from unstable training and inefficient sampling. To enhance generation performance and training stability, we propose GGFlow, a discrete flow matching generative model incorporating an efficient optimal transport for graph structures and it incorporates an edge-augmented graph transformer to enable direct communications among edges. Additionally, GGFlow introduces a novel goal-guided generation framework to control the generative trajectory of our model towards desired properties. GGFlow demonstrates superior performance on both unconditional and conditional generation tasks, outperforming existing baselines and underscoring its effectiveness and potential for wider application.

URL: https://openreview.net/forum?id=K8RlXtMgzo

---

Title: Is There a Better Source Distribution than Gaussian? Exploring Source Distributions for Image Flow Matching

Authors: Junho Lee, Kwanseok Kim, Joonseok Lee

Abstract: Flow matching has emerged as a powerful generative modeling approach with flexible choices of source distribution. While Gaussian distributions are commonly used, the potential for better alternatives in high-dimensional data generation remains largely unexplored. In this paper, we propose a novel 2D simulation that captures high-dimensional geometric properties in an interpretable 2D setting, enabling us to analyze the learning dynamics of flow matching during training. Based on this analysis, we derive several key insights about flow matching behavior: (1) density approximation can paradoxically degrade performance due to mode discrepancy, (2) directional alignment suffers from path entanglement when overly concentrated, (3) Gaussian's omnidirectional coverage ensures robust learning, and (4) norm misalignment incurs substantial learning costs. Building on these insights, we propose a practical framework that combines norm-aligned training with directionally-pruned sampling. This approach maintains the robust omnidirectional supervision essential for stable flow learning, while eliminating initializations in data-sparse regions during inference. Importantly, our pruning strategy can be applied to any flow matching model trained with a Gaussian source, providing immediate performance gains without the need for retraining. Empirical evaluations demonstrate consistent improvements in both generation quality and sampling efficiency. Our findings provide practical insights and guidelines for source distribution design and introduce a readily applicable technique for improving existing flow matching models. Our code is available at https://github.com/kwanseokk/SourceFM.

URL: https://openreview.net/forum?id=sev0GtV1fc

---

Title: BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Authors: Tao Yu, Zhengbo Zhang, Zhiheng Lyu, Junhao Gong, Hongzhu Yi, Xinming Wang, Yuxuan Zhou, Jiabing Yang, Ping Nie, Yan Huang, Wenhu Chen

Abstract: Efficiently solving real-world problems with LLMs increasingly hinges on their ability to interact with dynamic web environments and autonomously acquire external information. While recent research like Search-R1 and WebDancer demonstrates strong performance in solving web tasks, they heavily rely on additional tools to convert the interactive web environment into static text content. This is in contrast to human browsing behaviors, which involve diverse interactions with the browser, such as scrolling, clicking, and typing. In this paper, we propose BrowserAgent, a more interactive agent that solves complex tasks through human-inspired browser actions. BrowserAgent operates directly on raw web pages via Playwright through a set of predefined browser actions. We adopt a two-stage training (Supervised Fine-Tuning (SFT) and Rejection Fine-Tuning (RFT)) to improve the model's generalization abilities. Despite using significantly less training data than Search-R1, BrowserAgent achieves more competitive results across different Open-QA tasks. Additionally, we introduce an explicit memory mechanism to store key conclusions across steps, further enhancing the model's reasoning capabilities for long-horizon tasks. Notably, BrowserAgent-7B can achieve around 20\% improvement over Search-R1 on multi-hop QA tasks like HotpotQA, 2Wiki, and Bamboogle. These results indicate that BrowserAgent can serve as a more advanced framework for more interactive and scalable web agents.

URL: https://openreview.net/forum?id=X4CfZPSEHE

---

Title: Leveraging the True Depth of LLMs

Authors: Ramón Calvo González, Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret

Abstract: The remarkable capabilities of Large Language Models (LLMs) are overshadowed by their immense
computational cost. While recent work has shown that many LLM layers can be reordered or even
removed with minimal impact on accuracy, these insights have not been translated into significant
inference speedups. To bridge this gap, we introduce a novel method that restructures the
computational graph by grouping and evaluating consecutive layer pairs in parallel. This approach,
requiring no retraining, yields a 1.19x throughput gain on Llama 2 7B while reducing the average
benchmark accuracy by only 1.5%. We demonstrate the practical value of this method for large-scale
LLM deployment and show that some of the lost accuracy can be recovered with lightweight
fine-tuning of the parallelized layers.

URL: https://openreview.net/forum?id=JccJ6YfWd4

---

Title: An analysis of distributional reinforcement learning with Gaussian mixtures

Authors: Mathis Antonetti, Henrique Donancio, Florence Forbes

Abstract: Distributional Reinforcement Learning (DRL) aims at optimizing a risk measure of the return by representing its distribution. However, finding a representation of this distribution is challenging as it requires a tractable estimation of the risk measure, a tractable loss, and a representation with enough approximation power. Although Gaussian mixtures (GM) are powerful statistical models to solve these challenges, only very few papers have investigated this approach and most use the L$_2$ space norm as a tractable metric between GM. In this paper, we provide new theoretical results on previously unstudied metrics. We show that the L$_2$ metric is not suitable and propose alternative metrics, a mixture-specific optimal transport (MW) distance and a maximum mean discrepancy distance. Focusing on temporal difference (TD) learning, we prove a convergence result for a related dynamic programming algorithm for the MW metric. Leveraging natural multivariate GM representations, we also highlight the potential of MW in multi-objective RL. Our approach is illustrated on some environments of the Atari Learning Environment benchmark and shows promising empirical results.

URL: https://openreview.net/forum?id=b4VgI1RTv8

---

Title: CADmium: Fine-Tuning Code Language Models for Text- Driven Sequential CAD Design

Authors: Prashant Govindarajan, Davide Baldelli, Jay Pathak, Quentin Fournier, Sarath Chandar

Abstract: Computer-aided design (CAD) is the digital construction of 2D and 3D objects, and is central to a wide range of engineering and manufacturing applications like automobile and aviation. Despite its importance, CAD modeling remains largely a time-intensive, manual task. Recent works have attempted to automate this process with small transformer-based models and handcrafted CAD sequence representations. However, there has been little effort to leverage the potential of large language models (LLMs) for sequential CAD design. In this work, we introduce a new large-scale dataset of more than 170k CAD models annotated with high-quality, human-like descriptions generated with our pipeline based on GPT-4.1. Using this dataset, we fine-tune powerful code-LLMs to generate CAD sequences represented in a JSON-based format from natural language descriptions, demonstrating the viability and effectiveness of this approach for text-conditioned CAD generation. Because simple metrics often fail to reflect the quality of generated objects, we introduce geometric and topological metrics based on sphericity, mean curvature, and Euler characteristic to provide richer structural insights. Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design, drastically speeding up the design of new objects. The dataset, code, and fine-tuned models are available online.

URL: https://openreview.net/forum?id=lExqWvQht8

---

Title: Beyond Expectations: Learning with Stochastic Dominance Made Practical

Authors: Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

Abstract: Stochastic dominance serves as a general framework for modeling a broad spectrum of decision preferences under uncertainty, with risk aversion as one notable example, as it naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: i), the original concept of stochastic dominance only provides a partial order, therefore, is not amenable to serve as a general optimality criterion; and ii), an efficient computational recipe remains lacking due to the continuum nature of evaluating stochastic dominance.

In this work, we make the first attempt towards establishing a general framework of learning with stochastic dominance. We first generalize the stochastic dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We next develop a simple and computationally efficient approach for finding the optimal solution in terms of stochastic dominance, which can be seamlessly plugged into many learning tasks. Numerical experiments demonstrate that the proposed method achieves comparable performance as standard risk-neutral strategies and obtains better trade-offs against risk across a variety of applications including supervised learning, reinforcement learning, and portfolio optimization.

URL: https://openreview.net/forum?id=ebyPKXsweD

---

Title: Fast weight programming and linear transformers: from machine learning to neurobiology

Authors: Kazuki Irie, Samuel J. Gershman

Abstract: Recent advances in artificial neural networks for machine learning, and language modeling in particular, have established a family of recurrent neural network (RNN) architectures that, unlike conventional RNNs with vector-form hidden states, use two-dimensional (2D) matrix-form hidden states. Such 2D-state RNNs, known as Fast Weight Programmers (FWPs), can be interpreted as a neural network whose synaptic weights (called fast weights) dynamically change over time as a function of input observations, and serve as short-term memory storage; corresponding synaptic weight modifications are controlled or programmed by another network (the programmer) whose parameters are trained (e.g., by gradient descent). In this Primer, we review the technical foundations of FWPs, their computational characteristics, and their connections to transformers and state space models. We also discuss connections between FWPs and models of synaptic plasticity in the brain, suggesting a convergence of natural and artificial intelligence.

URL: https://openreview.net/forum?id=TDG8EkNmQR

---

Title: TabRep: Training Tabular Diffusion Models with a Simple and Effective Continuous Representation

Authors: Jacob Si, Zijing Ou, Mike Qu, Zhengrui Xiang, Yingzhen Li

Abstract: Diffusion models have been the predominant generative model for tabular data generation. However, they face the conundrum of modeling under a separate versus a unified data representation. The former encounters the challenge of jointly modeling all multi-modal distributions of tabular data in one model. While the latter alleviates this by learning a single representation for all features, it currently leverages sparse suboptimal encoding heuristics and necessitates additional computation costs. In this work, we address the latter by presenting TabRep, a tabular diffusion architecture trained with a unified continuous representation. To motivate the design of our representation, we provide geometric insights into how the data manifold affects diffusion models. The key attributes of our representation are composed of its density, flexibility to provide ample separability for nominal features, and ability to preserve intrinsic relationships. Ultimately, TabRep provides a simple yet effective approach for training tabular diffusion models under a continuous data manifold. Our results showcase that TabRep achieves superior performance across a broad suite of evaluations. It is the first to synthesize tabular data that exceeds the downstream quality of the original datasets while preserving privacy and remaining computationally efficient.

URL: https://openreview.net/forum?id=yRbtFEh2OP

---

Title: The Geometry of Algorithmic Stability: A Hodge Theoretic View on Structural vs. Statistical Instability

Authors: Karen Sargsyan

Abstract: Algorithmic stability—the robustness of predictions to training data perturbations—is fundamental to reliable machine learning. We propose a unified mathematical framework that rigorously distinguishes between two sources of instability: structural inconsistency and statistical variance. We formalize structural inconsistency using Combinatorial Hodge Theory, characterizing it as cyclical flows (Condorcet cycles) on a graph of hypotheses. This framework reveals that methods like inflated operators and regularization specifically target these structural obstructions, while methods like bagging primarily address statistical variance. We provide direct empirical validation through three key experiments. First, in a controlled setting with engineered Condorcet cycles (pure structural instability), inflated operators achieve perfect stability while bagging fails, confirming the core distinction. Second, we validate on a standard digit classification task that structural obstructions are negligible ($||C_{cycle}|| \approx 2.3 \times 10^{-16}$, machine precision), explaining the empirical dominance of variance-reduction methods. Third, we demonstrate that significant structural obstructions naturally emerge in fairness-constrained model selection on real-world data ($||C_{cycle}|| = 0.857$, approximately $10^{15}$ times larger), providing a topological characterization of the instability arising from incompatible objectives.

URL: https://openreview.net/forum?id=rFqsgVXZYO

---

Title: Vejde: A Framework for Inductive Deep Reinforcement Learning Based on Factor Graph Color Refinement

Authors: Jakob Nyberg, Pontus Johnson

Abstract: We present and evaluate Vejde; a framework which combines data abstraction, graph learning and reinforcement learning
to produce inductive policy functions for decision problems with richly structured states, such as object classes and relations.
Markov decision process states are represented as data bases of facts about entities, and Vejde converts each state to a bipartite graph, which is mapped to latent states through neural message passing. The factored representation of both states and actions allows Vejde agents to handle problems of varying size and structure. We tested Vejde agents on eight problem domains defined in RDDL, with ten problem instances each, where policies were trained using both supervised and reinforcement learning. To test policy generalization, we separate problem instances in two sets, one for training and the other solely for testing. Test results on unseen instances for the Vejde agents were compared to MLP agents trained on each problem instance, as well as the online planning algorithm Prost.
Our results show that Vejde policies in average generalize to the test instances without a significant loss in score. Additionally, the inductive agents received scores on unseen test instances that on average were close to the instance-specific MLP agents.

URL: https://openreview.net/forum?id=EFSZmL1W1Z

---

Title: Adversarial Vulnerability from On-Manifold Inseparability and Poor Off-Manifold Convergence

Authors: Rajdeep Haldar, Yue Xing, Qifan Song, Guang Lin

Abstract: We introduce a new perspective on adversarial vulnerability in image classification: fragility can arise from poor convergence in off-manifold directions. We model data as lying on low-dimensional manifolds, where on-manifold directions correspond to high-variance, data-aligned features and off-manifold directions capture low-variance, nuanced features. Standard first-order optimizers, such as gradient descent, are inherently ill-conditioned, leading to slow or incomplete convergence in off-manifold directions. When data is inseparable along the on-manifold direction, robustness depends on learning these subtle off-manifold features, and failure to converge leaves models exposed to adversarial perturbations.

On the theoretical side, we formalize this mechanism through convergence analyses of logistic regression and two-layer linear networks under first-order methods. These results highlight how ill-conditioning slows or prevents convergence in off-manifold directions, thereby motivating the use of second-order methods which mitigate ill-conditioning and achieve convergence across all directions. Empirically, we demonstrate that even without adversarial training, robustness improves significantly with extended training or second-order optimization, underscoring convergence as a central factor.

As an auxiliary empirical finding, we observe that batch normalization suppresses these robustness gains, consistent with its implicit bias toward uniform-margin rather than max-margin solutions.

By introducing the notions of on- and off-manifold convergence, this work provides a novel theoretical explanation for adversarial vulnerability.

URL: https://openreview.net/forum?id=pa90uRZATF

---

Title: ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Authors: Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zheng Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh, Qian Lou, Jie Xu, Yen-Chang Hsu

Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities but face deployment challenges due to their high computational demands. Traditional pruning methods reduce these costs by permanently removing parameters, which inevitably leads to performance degradation. To mitigate this issue, we propose ToMoE, a method that transforms dense LLMs into Mixture-of-Experts (MoE) models by uncovering experts inherently present within dense models, without requiring any weight updates. ToMoE leverages dynamic structural pruning to unify expert construction and router training in a single stage, achieving consistently strong performance. Remarkably, even without fine-tuning \revise{the model weights}, ToMoE consistently outperforms state-of-the-art pruning and MoE techniques across Phi-2, LLaMA-2, LLaMA-3, and Qwen-2.5 models. The code for this paper is available at https://github.com/gaosh/ToMoE.

URL: https://openreview.net/forum?id=RFHq46pjb6

---

Title: StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Authors: Jialin Yang, Dongfu Jiang, Tony He, Sherman Siu, Yuxuan Zhang, Disen Liao, Zhuofeng Li, Huaye Zeng, Yiming Jia, Haozhe Wang, Benjamin Schneider, Chi Ruan, Wentao Ma, Zhiheng Lyu, Yifei Wang, Yi Lu, Quy Duc Do, Ziyan Jiang, Ping Nie, Wenhu Chen

Abstract: As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce $\textbf{StructEval}$, a comprehensive benchmark for evaluating LLMs' capabilities in producing both non-renderable (JSON, YAML, CSV) and renderable (HTML, React, SVG) structured formats. Unlike prior benchmarks, StructEval systematically evaluates structural fidelity across diverse formats through two paradigms: $\textbf{1)}$ generation tasks, producing structured output from natural language prompts, and $\textbf{2)}$ conversion tasks, translating between structured formats. Our benchmark encompasses 18 formats and 44 types of task, with novel metrics for format adherence and structural correctness. Results reveal significant performance gaps—even state-of-the-art models like o1-mini achieve only $75.58$ average score, with open-source alternatives lagging approximately $10$ points behind. We find generation tasks more challenging than conversion tasks, and producing correct visual content more difficult than generating text-only structures.

URL: https://openreview.net/forum?id=buDwV7LUA7

---

Title: DuFal: Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-view CBCT Reconstruction

Authors: Cuong Tran Van, Trong-Thang Pham, Ngoc-Son Nguyen, Duy Minh Ho Nguyen, Ngan Le

Abstract: Sparse-view Cone-Beam Computed Tomography reconstruction from limited X-ray projections remains a challenging problem in medical imaging due to the inherent undersampling of fine-grained anatomical details, which correspond to high-frequency components. Conventional CNN-based methods often struggle to recover these fine structures, as they are typically biased toward learning low-frequency information. To address this challenge, this paper presents DuFal (Dual-Frequency-Aware Learning), a novel framework that integrates frequency-domain and spatial-domain processing via a dual-path architecture. The core innovation lies in our High-Local Factorized Fourier Neural Operator, which comprises two complementary branches: a Global High-Frequency Enhanced Fourier Neural Operator that captures global frequency patterns and a Local High-Frequency Enhanced Fourier Neural Operator that processes spatially partitioned patches to preserve spatial locality that might be lost in global frequency analysis. To improve efficiency, we design a Spectral-Channel Factorization scheme that reduces the Fourier Neural Operator parameter count. We also design a Cross-Attention Frequency Fusion module to integrate spatial and frequency features effectively. The fused features are then decoded through a Feature Decoder to produce projection representations, which are subsequently processed through an Intensity Field Decoding pipeline to reconstruct a final Computed Tomography volume. Experimental results on the LUNA16 and ToothFairy datasets demonstrate that DuFal significantly outperforms existing state-of-the-art methods in preserving high-frequency anatomical features, particularly under extremely sparse-view settings.

URL: https://openreview.net/forum?id=2wAZjAtK16

---

Title: Eyes on the Road, Words in the Changing Skies: Vision-Language Assistance for Autonomous Driving in Transitional Weather

Authors: Madhavi Kondapally, K Naveen Kumar, C Krishna Mohan

Abstract: The rapid advancement of autonomous vehicle technology (AVT) necessitates robust scene perception and interactive decision-making, particularly under adverse weather conditions. While significant progress has been made in extreme weather scenarios like cloudy, foggy, rainy, and snowy, a critical challenge remains in transitional weather conditions, such as the shift from cloudy to rainy, foggy to sunny, etc. These dynamic environmental changes degrade the performance of conventional vision-language systems by causing unpredictable illumination changes and partial occlusions, which are inadequately represented in current AVT datasets. This lack of continuous, transitional training data compromises model robustness and ultimately affects safety and reliability. On the other hand, Vision-language Models (VLMs) enable interpretable reasoning in autonomous driving through tasks such as image captioning and visual question answering. However, current VLMs, designed for clear weather, perform poorly in transitional conditions and rely on computationally expensive LLMs. This leads to high memory usage and slow inference, which is unsuitable for real-time decision making in AVT. To address these limitations, we propose Vision-language Assistance for Autonomous Driving under Transitional Weather (VLAAD-TW), a lightweight framework with a novel cross-modal spatiotemporal reasoning architecture that robustly interprets and acts on multimodal data. The VLAAD-TW framework integrates a Feature Encoder for Transitional Weather (FETW), a lightweight backbone for robust visual feature extraction, with a Spatiotemporal Contextual Aggregator (SCA), which models dynamic weather-induced changes. It uses a Selective Attention-guided Fusion Module (SAFM) to balance visual and linguistic cues for a unified representation dynamically. Finally, a Semantic Text Generator (STG) fuses these representations to produce context-aware driving information, adapting in real time to both current and predicted weather states. Further, we introduce the AIWD16-text dataset, an adverse intermediate weather driving dataset for vision language tasks, which features sixteen transitional weather states created using a Stochastic Conditional Variational Autoencoder (SC-VAE) and enriched with manual annotations of image captions and open-ended question-answer pairs. An extensive evaluation of the AIWD16-text and DriveLM datasets demonstrates VLAAD-TW's high performance in BLEU and ROUGE scores, with low memory and computational requirements, confirming its effectiveness in challenging weather conditions.

URL: https://openreview.net/forum?id=PCEDvdVJon

---

Title: Achieving Global Flatness in Decentralized Learning with Heterogeneous Data

Authors: Sakshi Choudhary, Sai Aparna Aketi, Kaushik Roy

Abstract: Decentralized training enables peer-to-peer on-device learning without relying on a central server, but suffers from degraded generalization performance under heterogeneous data distributions due to local overfitting. One strategy to mitigate this is to seek flatter loss landscapes during local optimization at each client. However, with extreme data heterogeneity, local objectives may diverge from the global one, yielding local flatness rather than true global flatness. To mitigate this challenge, we introduce GFlat, a novel decentralized algorithm that enables each client to estimate and incorporate an approximation of the global update direction while seeking a flatter loss landscape locally.
This lightweight strategy allows each client to directly contribute to global flatness without requiring additional communication or centralized coordination.
We theoretically analyze the convergence properties of GFlat and validate its performance through extensive experiments across a range of datasets, model architectures, and communication topologies. GFlat consistently improves generalization in non-IID data settings and achieves up to 6.75\% higher test accuracy compared to state-of-the-art decentralized methods.

URL: https://openreview.net/forum?id=8G32T4RLbX

---

Title: Offline Model-Based Optimization: Comprehensive Review

Authors: Minsu Kim, Jiayao Gu, Ye Yuan, Taeyoung Yun, Zixuan Liu, Yoshua Bengio, Can Chen

Abstract: Offline black-box optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. This setting is particularly relevant when querying the objective function is prohibitively expensive or infeasible, with applications spanning protein engineering, material discovery, neural architecture search, and beyond. The main difficulty lies in accurately estimating the objective landscape beyond the available data, where extrapolations are fraught with significant epistemic uncertainty. This uncertainty can lead to objective hacking (reward hacking)—exploiting model inaccuracies in unseen regions—or other spurious optimizations that yield misleadingly high performance estimates outside the offline distribution. Recent advances in model-based optimization (MBO) have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Trained with carefully designed strategies, these models are more robust against out-of-distribution issues, facilitating the discovery of improved designs. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review. To bridge this gap, we present the first thorough review of offline MBO. We begin by formalizing the problem for both single-objective and multi-objective settings and by reviewing recent benchmarks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs. Finally, we examine the key challenges and propose promising directions for advancement in this rapidly evolving field including safe control of superintelligent systems.

URL: https://openreview.net/forum?id=QcSZWo1TLl

---

Title: Towards Fair In-Context Learning with Tabular Foundation Models

Authors: Patrik Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji

Abstract: Transformer-based tabular foundation models have recently demonstrated promising in-context learning (ICL) performance on structured data, emerging as competitive alternatives to gradient-boosted trees. However, the fairness implications of this new paradigm remain largely unexplored. We present the first investigation of fairness in tabular ICL, evaluating three recently proposed foundation models—TabPFNv2, TabICL, and TabDPT—on multiple benchmark datasets. To mitigate biases, we explore three pre-processing fairness-enhancing methods: correlation removal (decorrelating input features from the sensitive attribute), group-balanced sample selection (ensuring equal representation of protected groups in context examples), and uncertainty-based sample selection (prioritizing context examples with high sensitive-attribute prediction uncertainty). Our experiments show that the uncertainty-based strategy consistently improves group fairness metrics (e.g., demographic parity, equalized odds, and equal opportunity) with minimal impact on predictive accuracy. We release our code to facilitate reproducibility https://github.com/patrikken/Fair-TabICL.

URL: https://openreview.net/forum?id=AsBhwD0sqo

---

Title: Multi-Step Alignment as Markov Games: An Optimistic Online Mirror Descent Approach with Convergence Guarantees

Authors: Yongtao Wu, Luca Viano, Kimon Antonakopoulos, Yihang Chen, Zhenyu Zhu, Quanquan Gu, Volkan Cevher

Abstract: Reinforcement Learning from Human Feedback (RLHF) has been highly successful in aligning large language models with human preferences. While prevalent methods like DPO have demonstrated strong performance, they frame interactions with the language model as a bandit problem, which limits their applicability in real-world scenarios where multi-turn conversations are common. Additionally, DPO relies on the Bradley-Terry model assumption, which does not adequately capture the non-transitive nature of human preferences. In this paper, we address these challenges by modeling the alignment problem as a two-player constant-sum Markov game, where each player seeks to maximize their winning rate against the other across all steps of the conversation. Our approach Optimistic Multi-step Preference Optimization (OMPO) is built upon the optimistic online mirror descent algorithm~\citep{rakhlin2013online,joulani17a}. Theoretically, we provide a rigorous analysis for the convergence of OMPO and show that OMPO requires $\mathcal{O}(\epsilon^{-1})$ policy updates to converge to an $\epsilon$-approximate Nash equilibrium. We also validate the effectiveness of our method on multi-turn conversations dataset and math reasoning dataset.

URL: https://openreview.net/forum?id=ZWZKaqZCy0

---

Title: Training More Robust Classification Model via Discriminative Loss and Gaussian Noise Injection

Authors: Hai-Vy Nguyen, Fabrice Gamboa, Sixin Zhang, Reda CHHAIBI, Serge Gratton, Thierry Giaccone

Abstract: Robustness of deep neural networks to input noise remains a critical challenge, as naive noise injection often degrades accuracy on clean (uncorrupted) data. We propose a novel training framework that addresses this trade-off through two complementary objectives. First, we introduce a loss function applied at the penultimate layer that explicitly enforces intra-class compactness and increases the margin to analytically defined decision boundaries. This enhances feature discriminativeness and class separability for clean data. Second, we propose a class-wise feature alignment mechanism that brings noisy data clusters closer to their clean counterparts. Furthermore, we provide a theoretical analysis demonstrating that improving feature stability under additive Gaussian noise implicitly reduces the curvature of the softmax loss landscape in input space, as measured by Hessian eigenvalues.This thus naturally enhances robustness without explicit curvature penalties. Conversely, we also theoretically show that lower curvatures lead to more robust models. We validate the effectiveness of our method on standard benchmarks and our custom dataset. Our approach significantly reinforces model robustness to various perturbations while maintaining high accuracy on clean data, advancing the understanding and practice of noise-robust deep learning.

URL: https://openreview.net/forum?id=RnLfJgvST2

---

Title: Synergistic Benefits of Joint Molecule Generation and Property Prediction

Authors: Adam Izdebski, Jan Olszewski, Pankhil Gawade, Krzysztof Koras, Serra Korkmaz, Valentin Rauscher, Jakub M. Tomczak, Ewa Szczurek

Abstract: Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism and a joint pre-training scheme. We show that Hyformer is simultaneously optimized for molecule generation and property prediction, while exhibiting synergistic benefits in conditional sampling, out-of-distribution property prediction and representation learning. Finally, we demonstrate the benefits of joint learning in a drug design use case of discovering novel antimicrobial peptides.

URL: https://openreview.net/forum?id=jnzCOLyGOA

---

Title: Denoising Diffusions with Optimal Transport: Localization, Curvature, and Multi-Scale Complexity

Authors: Tengyuan Liang, Kulunu Dharmakeerthi, Takuya Koriyama

Abstract: Adding noise is easy; what about denoising? Diffusion is easy; what about reverting a diffusion? Diffusion-based generative models aim to denoise a Langevin diffusion chain, moving from a log-concave equilibrium measure $\nu$, say an isotropic Gaussian, back to a complex, possibly non-log-concave initial measure $\mu$. The score function performs denoising, moving backward in time, and predicting the conditional mean of the past location given the current one. We show that score denoising is the optimal backward map in transportation cost. What is its localization uncertainty? We show that the curvature function determines this localization uncertainty, measured as the conditional variance of the past location given the current. We study in this paper the effectiveness of the diffuse-then-denoise process: the contraction of the forward diffusion chain, offset by the possible expansion of the backward denoising chain, governs the denoising difficulty. For any initial measure $\mu$, we prove that this offset net contraction at time $t$ is characterized by the curvature complexity of a smoothed $\mu$ at a specific signal-to-noise ratio (SNR) scale $r(t)$. We discover that the multi-scale curvature complexity collectively determines the difficulty of the denoising chain. Our multi-scale complexity quantifies a fine-grained notion of average-case curvature instead of the worst-case. Curiously, it depends on an integrated tail function, measuring the relative mass of locations with positive curvature versus those with negative curvature; denoising at a specific SNR scale is easy if such an integrated tail is light. We conclude with several non-log-concave examples to demonstrate how the multi-scale complexity probes the bottleneck SNR for the diffuse-then-denoise process.

URL: https://openreview.net/forum?id=sj1wU6gBXH

---

Title: PredLDM: Spatiotemporal Sequence Prediction with Latent Diffusion Models

Authors: Yechao Xu, Zhengxing Sun, Qian Li, Jiao Qu

Abstract: Predicting the accurate and realistic future is an attractive landmark in spatiotemporal sequence prediction. Despite recent progress in spatiotemporal predictive models, explorations in this field are challenging due to difficulties in intricate global coherence and comprehensive history understanding. In this study, we introduce latent diffusion models (LDMs) into spatiotemporal sequence prediction (PredLDM) with a two-stage training paradigm. (i) To compress intricate global coherent spatiotemporal content into latent space, we propose the masked-attention transformer-based variational autoencoder (MT-VAE) by exploiting transformers with masked self-attention layers. (ii) Different from LDMs in generation-related fields where the condition in our problem settings is historical observations instead of texts, the condition-aware LDM (CA-LDM) is provided for comprehensive understanding of historical sequences. Our denoising diffusion process learns the distribution of both conditional generation and condition-aware reconstruction. Results on KittiCaltech, KTH and SEVIR datasets show that our PredLDM provides promising performance and realistic predictions in multiple scenarios including car driving, humans and weather evolutions. (https://github.com/MaoWuToday/PredLDM.git)

URL: https://openreview.net/forum?id=TWmnOUzcCo

---

Title: iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency

Authors: Yunusa Haruna, Adamu Lawan, Abdulganiyu Abdu Yusuf

Abstract: The recent emergence of hybrid models has introduced a transformative approach to computer vision, gradually moving beyond conventional convolutional neural networks and vision transformers. However, efficiently combining these two approaches to better capture long-range dependencies in complex images remains a challenge. In this paper, we present iiANET (Inception Inspired Attention Network), an efficient hybrid visual backbone designed to improve the modeling of long-range dependencies in complex visual recognition tasks. The core innovation of iiANET is the iiABlock, a unified building block that integrates a modified global r-MHSA (Multi-Head Self-Attention) and convolutional layers in parallel. This design enables iiABlock to simultaneously capture global context and local details, making it effective for extracting rich and diverse features. By efficiently fusing these complementary representations, iiABlock allows iiANET to achieve strong feature interaction while maintaining computational efficiency. Extensive qualitative and quantitative evaluations on some SOTA benchmarks demonstrate improved performance.

URL: https://openreview.net/forum?id=HGSjlgFodQ

---

Title: ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding and Segmentation

Authors: Hesam Hosseini, Ghazal Hosseini Mighan, Amirabbas Afzali, Sajjad Amini, Amir Houmansadr

Abstract: Transformers have revolutionized Computer Vision (CV) through self-attention mechanisms. However, their complexity makes latent token representations difficult to interpret. We introduce ULTra, a framework for interpreting Transformer embeddings and uncovering meaningful semantic patterns within them. ULTra enables unsupervised semantic segmentation using pre-trained models without requiring fine-tuning. Additionally, we propose a self-supervised training approach that refines segmentation performance by learning an external transformation matrix without modifying the underlying model. Our method achieves state-of-the-art performance in unsupervised semantic segmentation, outperforming existing segmentation methods. Furthermore, we validate ULTra for model interpretation on both synthetic and real-world scenarios, including Object Selection and interpretable text summarization using LLMs, demonstrating its broad applicability in explaining the semantic structure of latent token representations.

URL: https://openreview.net/forum?id=vL3pmJjGDQ

---

Title: Overcoming Open-Set Approaches to Adversarial Defense

Authors: Edgar Wilfred Jatho, Armon Barton, Matthew Wright, Patrick McClure

Abstract: Machine learning (ML) models are increasingly proposed to replace or augment safety-critical information processing systems, yet their fragility to evasion attacks remains a well-documented, open problem. This work analyzes a class of deep neural network defenses that add a none-of-the-above (NOTA) class as an open-set-inspired, closed-set adversarial defense. We analyze seven prominent adversarial evasion attacks developed for computer vision classification and one attack developed for natural language processing classification, identifying how these attacks fail in the presence of a NOTA defense. We use this knowledge to adapt these attacks and provide empirical evidence that adding a NOTA class alone does not solve the core challenge of defending DNNs against evasion attacks. We release our adapted attack suite to enable more rigorous future evaluations of open-set-inspired defenses.

URL: https://openreview.net/forum?id=iuQ9r8VSIX

---

Title: Communication-Efficient Federated AUC Maximization with Cyclic Client Participation

Authors: Umesh-Vangapally, Wenhan Wu, Chen Chen, Zhishuai Guo

Abstract: Federated AUC maximization is a powerful approach for learning from imbalanced data in federated learning (FL). However, existing methods typically assume full client availability, which is rarely practical. In real-world FL systems, clients often participate in a cyclic manner: joining training according to a fixed, repeating schedule. This setting poses unique optimization challenges for the non-decomposable AUC objective.
This paper addresses these challenges by developing and analyzing communication-efficient algorithms for federated AUC maximization under cyclic client participation. We investigate two key settings:
First, we study AUC maximization with a squared surrogate loss, which reformulates the problem as a nonconvex-strongly-concave minimax optimization. By leveraging the Polyak-Łojasiewicz (PL) condition, we establish a state-of-the-art communication complexity of $\widetilde{O}(1/\epsilon^{1/2})$ and iteration complexity of $\widetilde{O}(1/\epsilon)$.
Second, we consider general pairwise AUC losses. We establish a communication complexity of $O(1/\epsilon^3)$ and an iteration complexity of $O(1/\epsilon^4)$. Further, under the PL condition, these bounds improve to communication complexity of $\widetilde{O}(1/\epsilon^{1/2})$ and iteration complexity of $\widetilde{O}(1/\epsilon)$.
Extensive experiments on benchmark tasks in image classification, medical imaging, and fraud detection demonstrate the superior efficiency and effectiveness of our proposed methods.

URL: https://openreview.net/forum?id=18yPFLbVRy

---

Title: On Calibration of Multilingual Question Answering LLMs

Authors: Yahan Yang, Soham Dan, Dan Roth, Insup Lee

Abstract: Multilingual pre-trained Large Language Models (LLMs) are incredibly effective at Question Answering (QA), a core task in Natural Language Understanding, achieving high accuracies on several multilingual benchmarks. However, little is known about how well their confidences are calibrated. In this paper, we comprehensively benchmark the calibration of several multilingual LLMs (MLLMs) on a variety of QA tasks. We perform extensive experiments, spanning encoder-only, encoder-decoder, and decoder-only QA models (size varying from 110M to 7B parameters) and diverse languages, including both high- and low-resource ones. We study different dimensions of calibration in in-distribution, out-of-distribution, and cross-lingual transfer settings, and investigate strategies to improve it, including post-hoc methods and regularized fine-tuning. For decoder-only LLMs such as LlaMa2, we additionally find that in-context learning improves confidence calibration on multilingual data.
We also conduct several ablation experiments to study the effect of language distances, language corpus size, and model size on calibration, and how multilingual models compare with their monolingual counterparts for diverse tasks and languages. Our experiments suggest that the multilingual QA models are poorly calibrated for languages other than English and incorporating a small set of cheaply translated multilingual samples during fine-tuning/calibration effectively enhances the calibration performance.

URL: https://openreview.net/forum?id=4klghu2PTj

---

New submissions
===============

Title: Node Perturbation Can Effectively Train Multi-Layer Neural Networks

Abstract: Backpropagation (BP) remains the dominant and most successful method for training parameters of deep neural network models.
However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics.
By comparison, node perturbation (NP), also known as activity-perturbed forward gradients, proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change.
NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems.
However, standard NP is highly data inefficient and can be unstable due to its unguided noise-based search process.
In this work, we develop a modern perspective on NP by relating it to the directional derivative and incorporating input decorrelation.
We find that a closer alignment with directional derivatives together with input decorrelation at every layer theoretically and practically enhances performance of NP learning with large improvements in parameter convergence and much higher performance on the test data, approaching that of BP.
Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible, which is of particular interest for on-chip learning in neuromorphic systems.

URL: https://openreview.net/forum?id=LxUw44pnpu

---

Title: Understanding the Resource Cost of Fully Homomorphic Encryption in Quantum Federated Learning

Abstract: Quantum Federated Learning (QFL) enables distributed training of Quantum Machine Learning (QML) models by sharing model gradients instead of raw data. However, these gradients can still expose sensitive user information. To enhance privacy, homomorphic encryption of parameters has been proposed as a solution in QFL and related frameworks. In this work, we evaluate the overhead introduced by Fully Homomorphic Encryption (FHE) in QFL setups and assess its feasibility for real-world applications. We implemented various QML models including a Quantum Convolutional Neural Network (QCNN) trained in a federated environment with parameters encrypted using the CKKS scheme. This work marks the first QCNN trained in a federated setting with CKKS-encrypted parameters. Models of varying architectures were trained to predict brain tumors from MRI scans. The experiments reveal that memory and communication overhead remain substantial, making FHE challenging to deploy. Minimizing overhead requires reducing the number of model parameters, which, however, leads to a decline in classification performance, introducing a trade-off between privacy and model complexity.

URL: https://openreview.net/forum?id=guZEpFIKcN

---

Reply all

Reply to author

Forward

0 new messages