Survey Certification: On the Challenges and Opportunities in Generative AI
Laura Manduchi, Clara Meister, Kushagra Pandey, Robert Bamler, Ryan Cotterell, Sina Däubener, Sophie Fellenz, Asja Fischer, Thomas Gärtner, Matthias Kirchler, Marius Kloft, Yingzhen Li, Christoph Lippert, Gerard de Melo, Eric Nalisnick, Björn Ommer, Rajesh Ranganath, Maja Rudolph, Karen Ullrich, Guy Van den Broeck, Julia E Vogt, Yixin Wang, Florian Wenzel, Frank Wood, Stephan Mandt, Vincent Fortuin
https://openreview.net/forum?id=NeS9Kj2JwF
---
Accepted papers
===============
Title: On the Challenges and Opportunities in Generative AI
Authors: Laura Manduchi, Clara Meister, Kushagra Pandey, Robert Bamler, Ryan Cotterell, Sina Däubener, Sophie Fellenz, Asja Fischer, Thomas Gärtner, Matthias Kirchler, Marius Kloft, Yingzhen Li, Christoph Lippert, Gerard de Melo, Eric Nalisnick, Björn Ommer, Rajesh Ranganath, Maja Rudolph, Karen Ullrich, Guy Van den Broeck, Julia E Vogt, Yixin Wang, Florian Wenzel, Frank Wood, Stephan Mandt, Vincent Fortuin
Abstract: The field of deep generative modeling has grown rapidly in the last few years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that current large-scale generative AI models exhibit several fundamental shortcomings that hinder their widespread adoption across domains. In this work, our objective is to identify these issues and highlight key unresolved challenges in modern generative AI paradigms that should be addressed to further enhance their capabilities, versatility, and reliability. By identifying these challenges, we aim to provide researchers with insights for exploring fruitful research directions, thus fostering the development of more robust and accessible generative AI solutions.
URL: https://openreview.net/forum?id=NeS9Kj2JwF
---
Title: A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection
Authors: Leonid Boytsov, Ameya Joshi, Filipe Condessa
Abstract: We experimented with front-end enhanced neural models where a differentiable and fully convolutional model with a skip connection added before a frozen backbone classifier. By training such composite models using a small learning rate for about one epoch, we obtained models that retained the accuracy of the backbone classifier while being unusually resistant to gradient attacks—including APGD and FAB-T attacks from the AutoAttack package—which we attribute to gradient masking.
Although gradient masking is not new, the degree we observe is striking for fully differentiable models without obvious gradient-shattering—e.g., JPEG compression—or gradient-diminishing components. The training recipe to produce such models is also remarkably stable and reproducible: We applied it to three datasets (CIFAR10, CIFAR100, and ImageNet) and several modern architectures (including vision Transformers) without a single failure case.
While black-box attacks such as the SQUARE attack and zero-order PGD can partially overcome gradient masking, these attacks are easily defeated by simple randomized ensembles. We estimate that these ensembles achieve near-SOTA AutoAttack accuracy on CIFAR10, CIFAR100, and ImageNet (while retaining almost all clean accuracy of the original classifiers) despite having near-zero accuracy under adaptive attacks.
Moreover, adversarially training the backbone further amplifies this front-end “robustness”. On CIFAR10, the respective randomized ensemble achieved 90.8±2.5% (99% CI) accuracy under the full AutoAttack while having only 18.2±3.6% accuracy under the adaptive attack (ε = 8/255, L∞ norm). While our primary goal is to expose weaknesses of the AutoAttack package—rather than to propose a new defense or establish SOTA in adversarial robustness—we nevertheless conclude the paper with a discussion of whether randomized ensembling can serve as a practical defense.
Code and instructions to reproduce key results are available. https://github.com/searchivarius/curious_case_of_gradient_masking
URL: https://openreview.net/forum?id=kt7Am2wHlm
---
Title: Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs
Authors: Hari Chandana Kuchibhotla, Sai Srinivas Kancheti, Abbavaram Gowtham Reddy, Vineeth N. Balasubramanian
Abstract: Fine-grained Visual Recognition (FGVR) involves distinguishing between visually similar categories, which is inherently challenging due to subtle inter-class differences and the need for large, expert-annotated datasets. In domains like medical imaging, such curated datasets are unavailable due to issues like privacy concerns and high annotation costs. In such scenarios lacking labeled data, an FGVR model cannot rely on a predefined set of training labels, and hence has an unconstrained output space for predictions. We refer to this task as Vocabulary-Free FGVR (VF-FGVR), where a model must predict labels from an unconstrained output space without prior label information. While recent Multimodal Large Language Models (MLLMs) show potential for VF-FGVR, querying these models for each test input is impractical because of high costs and prohibitive inference times. To address these limitations, we introduce Nearest-Neighbor label Refinement (NeaR), a novel approach that fine-tunes a downstream CLIP model using labels generated by an MLLM. Our approach constructs a weakly supervised dataset from a small, unlabeled training set, leveraging MLLMs for label generation. NeaR is designed to handle the noise, stochasticity, and open-endedness inherent in labels generated by MLLMs, and establishes a new benchmark for efficient VF-FGVR.
URL: https://openreview.net/forum?id=FvA0UMw9X2
---
Title: CYCle: Choosing Your Collaborators Wisely to Enhance Collaborative Fairness in Decentralized Learning
Authors: Nurbek Tastan, Samuel Horváth, Karthik Nandakumar
Abstract: Collaborative learning (CL) enables multiple participants to jointly train machine learning (ML) models on decentralized data sources without raw data sharing. While the primary goal of CL is to maximize the expected accuracy gain for each participant, it is also important to ensure that the gains are fairly distributed: no client should be negatively impacted, and gains should reflect contributions. Most existing CL methods require central coordination and focus only on gain maximization, overlooking fairness. In this work, we first show that the existing measure of collaborative fairness based on the correlation between accuracy values without and with collaboration has drawbacks because it does not account for negative collaboration gain. We argue that maximizing mean collaboration gain (MCG) while simultaneously minimizing the collaboration gain spread (CGS) is a fairer alternative. Next, we propose the CYCle protocol that enables individual participants in a private decentralized learning (PDL) framework to achieve this objective through a novel reputation scoring method based on gradient alignment between the local cross-entropy and distillation losses. We further extend the CYCle protocol to operate on top of gossip-based decentralized algorithms such as Gossip-SGD. We also theoretically show that CYCle performs better than standard FedAvg in a two-client mean estimation setting under high heterogeneity. Empirical experiments demonstrate the effectiveness of the CYCle protocol to ensure positive and fair collaboration gains for all participants, even in cases where the data distributions of participants are highly skewed. The code can be found at https://github.com/tnurbek/cycle.
URL: https://openreview.net/forum?id=ygqNiLQqfH
---
Title: Risk-controlling Prediction with Distributionally Robust Optimization
Authors: Franck Iutzeler, Adrien Mazoyer
Abstract: Conformal prediction is a popular paradigm to quantify the uncertainty of a model's output on a new batch of data. Quite differently, distributionally robust optimization aims at training a model that is robust to uncertainties in the distribution of the training data. In this paper, we examine the links between the two approaches. In particular, we show that we can learn conformal prediction intervals by distributionally robust optimization on a well chosen objective. This further entails to train a model and build conformal prediction intervals all at once, using the same data.
URL: https://openreview.net/forum?id=d9dl6DyJpJ
---
Title: A Proximal Operator for Inducing 2:4-Sparsity
Authors: Jonas M. Kübler, Yu-Xiang Wang, Shoham Sabach, Navid Ansari, Matthäus Kleindessner, Kailash Budhathoki, Volkan Cevher, George Karypis
Abstract: Recent hardware advancements in AI Accelerators and GPUs allow to efficiently compute sparse matrix multiplications, especially when 2 out of 4 consecutive weights are set to zero. However, this so-called 2:4 sparsity usually comes at a decreased accuracy of the model. We derive a regularizer that exploits the local correlation of features to find better sparsity masks in trained models. We minimize the regularizer jointly with a local squared loss by deriving the proximal operator for which we show that it has an efficient solution in the 2:4-sparse case. After optimizing the mask, we introduce masked-gradient updates to further minimize the local squared loss. We illustrate our method on toy problems and apply it to pruning entire large language models up to 70B parameters. On models up to 13B we improve over previous state of the art algorithms, whilst on 70B models we match their performance.
URL: https://openreview.net/forum?id=AsFbXRIe4q
---
Title: Hodge-Aware Convolutional Learning on Simplicial Complexes
Authors: Maosheng Yang, Geert Leus, Elvin Isufi
Abstract: Neural networks on simplicial complexes (SCs) can learn representations from data residing on simplices such as nodes, edges, triangles, etc. However, existing works often overlook the Hodge theorem that decomposes simplicial data into three orthogonal characteristic subspaces, such as the identifiable gradient, curl and harmonic components of edge flows. This provides a universal tool to understand the machine learning models on SCs, thus, allowing for better principled and effective learning. In this paper, we study the effect of this data inductive bias on learning on SCs via the principle of convolutions. Particularly, we present a general convolutional architecture that respects the three key principles of uncoupling the lower and upper simplicial adjacencies, accounting for the inter-simplicial couplings, and performing higher-order convolutions. To understand these principles, we first use Dirichlet energy minimizations on SCs to interpret their effects on mitigating simplicial oversmoothing. Then, we show the three principles promote the Hodge-aware learning of this architecture, through the lens of spectral simplicial theory, in the sense that the three Hodge subspaces are invariant under its learnable functions and the learning in two nontrivial subspaces is independent and expressive. Third, we investigate the learning ability of this architecture in optic of perturbation theory on simplicial topologies and prove that the convolutional architecture is stable to small perturbations. Finally, we corroborate the three principles by comparing with methods that either violate or do not respect them. Overall, this paper bridges learning on SCs with the Hodge theorem, highlighting its importance for rational and effective learning from simplicial data, and provides theoretical insights to convolutional learning on SCs.
URL: https://openreview.net/forum?id=Nm5sp09Q25
---
New submissions
===============
Title: Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective
Abstract: Machine Unlearning has emerged as a significant area of research, focusing on `removing' specific subsets of data from a trained model. Fine-tuning (FT) methods have become one of the fundamental approaches for approximating unlearning, as they effectively retain model performance. However, it is consistently observed that naive FT methods struggle to forget the targeted data.
In this paper, we present the first theoretical analysis of FT methods for machine unlearning within a linear regression framework, providing a deeper exploration of this phenomenon. Our analysis reveals that while FT models can achieve zero remaining loss, they fail to forget the forgetting data, as the pretrained model retains its influence and the fine-tuning process does not adequately mitigate it. To address this, we propose a novel Retention-Based Masking (RBM) strategy that constructs a weight saliency map based on the remaining dataset, unlike existing methods that focus on the forgetting dataset. Our theoretical analysis demonstrates that RBM not only significantly improves unlearning accuracy (UA) but also ensures higher retaining accuracy (RA) by preserving overlapping features shared between the forgetting and remaining datasets. Experiments on synthetic and real-world datasets validate our theoretical insights, showing that RBM outperforms existing masking approaches in balancing UA, RA, and disparity metrics.
URL: https://openreview.net/forum?id=4hNquAmFqf
---
Title: Bayesian Sensitivity of Causal Inference Estimators under Evidence-Based Priors
Abstract: Causal inference, especially in observational studies, relies on untestable assumptions about the true data-generating process. Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions. Existing frameworks for sensitivity analysis are concerned with worst-case changes in assumptions. In this work, we argue that using such pessimistic criteria can often become uninformative or lead to conclusions contradicting our prior knowledge about the world. To demonstrate this claim,
we generalize the recent s-value framework (Gupta & Rothenhäusler, 2023) to estimate the sensitivity of three different common assumptions in causal inference. Empirically, we find that, indeed, worst-case conclusions about sensitivity can rely on unrealistic changes in the data-generating process. To overcome this, we extend the s-value framework with a new sensitivity analysis criterion: Bayesian Sensitivity Value (BSV), which computes the expected sensitivity of an estimate to assumption violations under priors constructed from real-world evidence. We use Monte Carlo approximations to estimate this quantity and illustrate its applicability in an observational study on the effect of diabetes treatments on weight loss.
URL: https://openreview.net/forum?id=0zqt85NUyK
---