J2C Certification: Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks
Mahsa Taheri, Fang Xie, Johannes Lederer
https://openreview.net/forum?id=PNUMiLbLml
---
Accepted papers
===============
Title: On The Landscape of Spoken Language Models: A Comprehensive Survey
Authors: Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-yi Lee, Karen Livescu, Shinji Watanabe
Abstract: The field of spoken language processing is undergoing a shift from training custom-built, task-specific models toward using and optimizing spoken language models (SLMs) which act as universal speech processing systems. This trend is similar to the progression toward universal language models that has taken place in the field of (text) natural language processing. SLMs include both "pure" language models of speech---models of the distribution of tokenized speech sequences---and models that combine speech encoders with text language models, often including both spoken and written input or output. Work in this area is very diverse, with a range of terminology and evaluation settings. This paper aims to contribute an improved understanding of SLMs via a unifying literature survey of recent work in the context of the evolution of the field. Our survey categorizes the work in this area by model architecture, training, and evaluation choices, and describes some key challenges and directions for future work.
URL: https://openreview.net/forum?id=BvxaP3sVbA
---
Title: FeatInv: Spatially resolved mapping from feature space to input space using conditional diffusion models
Authors: Nils Neukirch, Johanna Vielhaben, Nils Strodthoff
Abstract: Internal representations are crucial for understanding deep neural networks, such as their properties and reasoning patterns, but remain difficult to interpret. While mapping from feature space to input space aids in interpreting the former, existing approaches often rely on crude approximations. We propose using a conditional diffusion model - a pretrained high-fidelity diffusion model conditioned on spatially resolved feature maps - to learn such a mapping in a probabilistic manner. We demonstrate the feasibility of this approach across various pretrained image classifiers from CNNs to ViTs, showing excellent reconstruction capabilities. Through qualitative comparisons and robustness analysis, we validate our method and showcase possible applications, such as the visualization of concept steering in input space or investigations of the composite nature of the feature space. This approach has broad potential for improving feature space understanding in computer vision models.
URL: https://openreview.net/forum?id=UtE1YnPNgZ
---
Title: Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks
Authors: Mahsa Taheri, Fang Xie, Johannes Lederer
Abstract: Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is unclear whether these theories explain the performances of actual outputs of neural network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for shallow linear neural networks that coincide up to logarithmic factors with the global optima but apply to stationary points and the points nearby. These results support the common notion that neural networks do not necessarily need to be optimized globally from a mathematical perspective. We then extend our statistical guarantees to shallow ReLU neural networks, assuming the first layer weight matrices are nearly identical for the stationary network and the target. More generally, despite being limited to shallow neural networks for now, our theories make an important step forward in describing the practical properties of neural networks in mathematical terms.
URL: https://openreview.net/forum?id=PNUMiLbLml
---
New submissions
===============
Title: Regret minimization in Linear Bandits with offline data via extended D-optimal exploration.
Abstract: We consider the problem of online regret minimization in stochastic linear bandits with access to prior observations (\emph{i.e.,} offline data) from the underlying bandit model. This setting is highly relevant to numerous applications where extensive offline data is often available, such as in recommendation systems, personalized healthcare, and online advertising. Consequently, this problem has been studied intensively in recent works such as~\cite{banerjee2022artificial, wagenmaker2022leveraging, agrawal2023optimal,hao2023leveraging,cheung2024leveraging}. We introduce the Offline-Online Phased Elimination (OOPE) algorithm, that effectively incorporates the offline data to substantially reduce the online regret compared to prior work. To leverage offline information prudently, OOPE uses an extended D-optimal design within each exploration phase. We show that OOPE achieves an online regret is $\tilde{O}(\sqrt{d_{\text{eff}} T \log \left(|\mathcal{A}|T\right)}+d^2)$, where $\mathcal{A}$ is the action set, $d$ is the dimension and $T$ is the online horizon. $d_{\text{eff}} \hspace{0.1cm} (\leq d)$ is the \emph{effective problem dimension} which measures the number of poorly explored directions in offline data and depends on the eigen-spectrum $(\lambda_k)_{k \in [d]}$ of the Gram matrix of the offline data. Thus the eigen-spectrum $(\lambda_k)_{k \in [d]}$ is a quantitative measure of the \emph{quality} of offline data. If the offline data is poorly explored ($d_{\text{eff}} \approx d$), we recover the established regret bounds for purely online linear bandits. Conversely, when offline data is abundant ($T_{\text{off}} \gg T$) and well-explored ($d_{\text{eff}} = o(1) $), the online regret reduces substantially. Additionally, we provide the first known minimax regret lower bounds in this setting that depend explicitly on the quality of the offline data. These lower bounds establish the optimality of our algorithm \footnote{Optimal within log factors in $T, T_{\text{off}}$ and additive constants in $d$} in regimes where offline data is either well-explored or poorly explored. Finally, by using a Frank-Wolfe approximation to the extended optimal design we further improve the $O(d^{2})$ term to $O\left(\frac{d^{2}}{d_{\text{eff}} } \min \{ d_{\text{eff}},1\} \right)$, which can be substantial in high dimensions with moderate quality of offline data $d_{\text{eff}} = \Omega(1)$.
URL: https://openreview.net/forum?id=4WcK8gKgCi
---
Title: Sovereign Federated Learning with Byzantine-Resilient Aggregation
Abstract: The concentration of artificial intelligence infrastructure in a few technologically advanced
nations creates significant barriers for emerging economies seeking to develop sovereign AI
capabilities. We present DSAIN (Distributed Sovereign AI Network), a novel federated
learning framework designed for decentralized AI infrastructure development in resourceconstrained
environments. Our framework introduces three key technical contributions: (1)
FedSov, a communication-efficient federated learning algorithm with provable convergence
guarantees under heterogeneous data distributions; (2) ByzFed, a Byzantine-resilient aggregation
mechanism that provides (ϵ, δ)-differential privacy while tolerating up to ⌊(n−1)/3⌋
malicious participants; and (3) a blockchain-based model provenance system enabling verifiable
and auditable federated learning. We provide theoretical analysis establishing convergence
rates of O(1/
√
T) for non-convex objectives and O(1/T ) for strongly convex objectives
under partial participation. Extensive experiments on CIFAR-10, CIFAR-100, and
real-world federated benchmarks demonstrate that DSAIN achieves accuracy within 2.3%
of centralized baselines while reducing communication costs by 78% and providing formal
privacy guarantees. We validate the framework through a deployment case study demonstrating
practical applicability in distributed computing environments.
URL: https://openreview.net/forum?id=REbjYw70Fu
---
Title: Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning
Abstract: Effective communication is pivotal for addressing complex collaborative tasks in multi-agent reinforcement learning (MARL). Yet, limited communication bandwidth and dynamic, intricate environmental topologies present significant challenges in identifying high-value communication partners. Agents must consequently select collaborators under uncertainty, lacking a priori knowledge of which partners can deliver task-critical information. To this end, we propose Interference-Aware $K$-Step Reachable Communication (IA-KRC), a novel framework that enhances cooperation via two core components: (1) a $K$-Step reachability protocol that confines message passing to physically accessible neighbors, and (2) an interference-prediction module that optimizes partner choice by minimizing interference while maximizing utility. Compared to existing methods, IA-KRC enables substantially more persistent and efficient cooperation despite environmental interference. Comprehensive evaluations confirm that IA-KRC achieves superior performance compared to state-of-the-art baselines, while demonstrating enhanced robustness and scalability in complex topological and highly dynamic multi-agent scenarios.
URL: https://openreview.net/forum?id=8Fo2AwQE9z
---