Weekly TMLR digest for Sep 29, 2024

10 views
Skip to first unread message

TMLR

unread,
Sep 29, 2024, 12:00:11 AM9/29/24
to tmlr-annou...@googlegroups.com


New certifications
==================

Featured Certification: Tweedie Moment Projected Diffusions for Inverse Problems

Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, Omer Deniz Akyildiz

https://openreview.net/forum?id=4unJi0qrTE

---


Survey Certification, Expert Certification: Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections

Gabriel Loaiza-Ganem, Brendan Leigh Ross, Rasa Hosseinzadeh, Anthony L. Caterini, Jesse C. Cresswell

https://openreview.net/forum?id=a90WpmSi0I

---


Accepted papers
===============


Title: Efficient Identification of Direct Causal Parents via Invariance and Minimum Error Testing

Authors: Minh Nguyen, Mert R. Sabuncu

Abstract: Invariant causal prediction (ICP) is a popular technique for finding causal parents (direct causes) of a target via exploiting distribution shifts and invariance testing (Peters et al., 2016). However, since ICP needs to run an exponential number of tests and fails to identify parents when distribution shifts only affect a few variables, applying ICP to practical large scale problems is challenging. We propose MMSE-ICP and fastICP, two approaches which employ an error inequality to address the identifiability problem of ICP. The inequality states that the minimum prediction error of the predictor using causal parents is the smallest among all predictors which do not use descendants. fastICP is an efficient approximation tailored for large problems as it exploits the inequality and a heuristic to run fewer tests. MMSE-ICP and fastICP not only outperform competitive baselines in many simulations but also achieve state-of-the-art result on a large scale real data benchmark.

URL: https://openreview.net/forum?id=3G7mFdGVRW

---

Title: Strategies for Pretraining Neural Operators

Authors: Anthony Zhou, Cooper Lorsung, AmirPouya Hemmasian, Amir Barati Farimani

Abstract: Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance. Despite these advances, our understanding of how pretraining affects neural operators is still limited; studies generally propose tailored architectures and datasets that make it challenging to compare or examine different pretraining frameworks. To address this, we compare various pretraining methods without optimizing architecture choices to characterize pretraining dynamics on different models and datasets as well as to understand its scaling and generalization behavior. We find that pretraining is highly dependent on model and dataset choices, but in general transfer learning or physics-based pretraining strategies work best. In addition, pretraining performance can be further improved by using data augmentations. Lastly, pretraining can be additionally beneficial when fine-tuning in scarce data regimes or when generalizing to downstream data similar to the pretraining distribution. Through providing insights into pretraining neural operators for physics prediction, we hope to motivate future work in developing and evaluating pretraining methods for PDEs.

URL: https://openreview.net/forum?id=9vEVeX9oIv

---

Title: Linear Weight Interpolation Leads to Transient Performance Gains

Authors: Gaurav Iyer, Gintare Karolina Dziugaite, David Rolnick

Abstract: We train copies of a neural network on different sets of SGD noise and find that linearly interpolating their weights can, remarkably, produce networks that perform significantly better than the original networks. However, such interpolated networks consistently end up in unfavorable regions of the optimization landscape: with further training, their performance fails to improve or degrades, effectively undoing the performance gained from the interpolation. We identify two quantities that impact an interpolated network's performance and relate our observations to linear mode connectivity. Finally, we investigate this phenomenon from the lens of example importance and find that performance improves and degrades almost exclusively on the harder subsets of the training data, while performance is stable on the easier subsets. Our work represents a step towards a better understanding of neural network loss landscapes and weight interpolation in deep learning.

URL: https://openreview.net/forum?id=XGAdBXlFcj

---

Title: A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial Networks

Authors: Forough Fazeli-Asl, Michael Minyi Zhang, Lizhen Lin

Abstract: A classic inferential problem in statistics is the goodness-of-fit (GOF) test. Performing such tests can be challenging when the hypothesized parametric model has an intractable likelihood and its distributional form is not available. Bayesian methods for GOF testing can be appealing due to their ability to incorporate expert knowledge through prior distributions. However, standard Bayesian methods for this test often require strong distributional assumptions on the data and their relevant parameters. To address this issue, we propose a semi-Bayesian nonparametric (semi-BNP) procedure based on the maximum mean discrepancy (MMD) measure that can be applied to the GOF test. We introduce a novel Bayesian estimator for the MMD, which enables the development of a measure-based hypothesis test for intractable models. Through extensive experiments, we demonstrate that our proposed test outperforms frequentist MMD-based methods by achieving a lower false rejection and acceptance rate of the null hypothesis. Furthermore, we showcase the versatility of our approach by embedding the proposed estimator within a generative adversarial network (GAN) framework. It facilitates a robust BNP learning approach as another significant application of our method. With our BNP procedure, this new GAN approach can enhance sample diversity and improve inferential accuracy compared to traditional techniques.

URL: https://openreview.net/forum?id=lUnlHS1FYT

---

Title: Learning Hierarchical Relational Representations through Relational Convolutions

Authors: Awni Altabaa, John Lafferty

Abstract: An evolving area of research in deep learning is the study of architectures and inductive biases that support the learning of relational feature representations. In this paper, we address the challenge of learning representations of hierarchical relations—that is, higher-order relational patterns among groups of objects. We introduce “relational convolutional networks”, a neural architecture equipped with computational mechanisms that capture progressively more complex relational features through the composition of simple modules. A key component of this framework is a novel operation that captures relational patterns in groups of objects by convolving graphlet filters—learnable templates of relational patterns—against subsets of the input. Composing relational convolutions gives rise to a deep architecture that learns representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.

URL: https://openreview.net/forum?id=vNZlnznmV2

---

Title: The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation

Authors: Nicolas Guerin, Emmanuel Chemla, Shane Steinert-Threlkeld

Abstract: Unsupervised on-the-fly back-translation, in conjunction with multilingual pretraining, is the dominant method for unsupervised neural machine translation. Theoretically, however, the method should not work in general. We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical, syntactic, and semantic properties. We find, contrary to popular belief, that (i)~parallel word frequency distributions, (ii)~partially shared vocabulary, and (iii)~similar syntactic structure across languages are not sufficient to explain the success of back-translation. We show however that even crude semantic signal (similar lexical fields across languages) does improve alignment of two languages through back-translation. We conjecture that rich semantic dependencies, parallel across languages, are at the root of the success of unsupervised methods based on back-translation. Overall, the success of unsupervised machine translation was far from being analytically guaranteed. Instead, it is another proof that languages of the world share deep similarities, and we hope to show how to identify which of these similarities can serve the development of unsupervised, cross-linguistic tools.

URL: https://openreview.net/forum?id=6DflIABPQP

---

Title: Improving Generalization of Complex Models under Unbounded Loss Using PAC-Bayes Bounds

Authors: Xitong Zhang, Avrajit Ghosh, Guangliang Liu, Rongrong Wang

Abstract: Previous research on PAC-Bayes learning theory has focused extensively on establishing tight upper bounds for test errors. A recently proposed training procedure called PAC-Bayes training, updates the model toward minimizing these bounds. Although this approach is theoretically sound, in practice, it has not achieved a test error as low as those obtained by empirical risk minimization (ERM) with carefully tuned regularization hyperparameters. Additionally, existing PAC-Bayes training algorithms often require bounded loss functions and may need a search over priors with additional datasets, which limits their broader applicability. In this paper, we introduce a new PAC-Bayes training algorithm with improved performance and reduced reliance on prior tuning. This is achieved by establishing a new PAC-Bayes bound for unbounded loss and a theoretically grounded approach that involves jointly training the prior and posterior using the same dataset. Our comprehensive evaluations across various classification tasks and neural network architectures demonstrate that the proposed method not only outperforms existing PAC-Bayes training algorithms but also approximately matches the test accuracy of ERM that is optimized by SGD/Adam using various regularization methods with optimal hyperparameters.

URL: https://openreview.net/forum?id=MP8bmxvWt6

---

Title: Contrastive Class Anchor Learning for Open Set Object Recognition in Driving Scenes

Authors: Zizhao Li, Kourosh Khoshelham, Joseph West

Abstract: Conventional object recognition models operate under closed-set assumptions presuming that the training dataset is sufficiently comprehensive that any object detected during inference can be assigned to some known prior class. This assumption is flawed and potentially dangerous for real-world applications such as driving scene perception where diverse objects and unexpected behaviours should be expected.
In order to progress towards trusted autonomous platforms object recognition models need Open Set Recognition (OSR) methods
capable of identifying unknown classes while maintaining good performance on known classes.
Existing OSR methods are mostly designed for image data and utilize generative models which are hard to train. In this paper, we propose S2CA, a Supervised Contrastive Class Anchor learning method which leverages contrastive learning principles to effectively reject unknown classes by increasing intra-class compactness and inter-class sparsity of known classes in feature space. We train a feature encoder through contrastive learning while ensuring that features of known classes form compact clusters, and then transfer the trained encoder to the OSR task. During inference, the model rejects unknown classes based on class-agnostic information in feature space and class-related information in logit space. The proposed OSR method is simple yet powerful. It is not only suitable for image-based object recognition models, but can also be used for a variety of lidar-based object recognition models. We demonstrate superior performance of S2CA when compared with state of the art methods on two widely used driving scene recognition datasets, i.e., KITTI and nuScenes.

URL: https://openreview.net/forum?id=l0Uum9SJgM

---

Title: Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Authors: Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.

URL: https://openreview.net/forum?id=38P40gJPrI

---

Title: On Safety in Safe Bayesian Optimization

Authors: Christian Fiedler, Johanna Menn, Lukas Kreisköther, Sebastian Trimpe

Abstract: Safe Bayesian Optimization (BO) is increasingly used to optimize an unknown function under safety constraints, a central task in robotics, biomedical engineering, and many other disciplines. Due to the safety-critical nature of these applications, it is crucial that theoretical safety guarantees for these algorithms translate into the real world. In this work, we investigate three safety-related issues in SafeOpt-type algorithms, a popular class of safe BO methods. First, these algorithms critically rely on frequentist uncertainty bounds for Gaussian Process (GP) regression, but concrete implementations typically utilize heuristics that invalidate all safety guarantees. We provide a detailed analysis of this problem and introduce Real-$\beta$-SafeOpt, a variant of the SafeOpt algorithm that leverages recent GP bounds and thus retains all theoretical guarantees. Second, we identify a key technical assumption in SafeOpt-like algorithms, the availability of an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function, as a central obstacle to real-world usage.
To address this issue, we propose to rely instead on a known Lipschitz and noise bound, and we introduce Lipschitz-only Safe Bayesian Optimization (LoSBO), a SafeOpt-type algorithm using the latter two assumptions. We show empirically that this algorithm is not only safe, but also outperforms the state-of-the-art on several function classes. Third, SafeOpt and derived algorithms rely on a %gridding of the search space, discrete search space, complicating their application to higher-dimensional problems. To broaden the applicability of these algorithms, we introduce Lipschitz-only Safe GP-UCB (LoS-GP-UCB), a LoSBO variant that is applicable to moderately high-dimensional problems, while retaining safety. By analyzing practical safety issues in an important class of safe BO algorithms, and providing ready-to-use algorithms that overcome these issues, this work contributes to bringing safe and reliable machine learning techniques closer to real world applications.

URL: https://openreview.net/forum?id=tgFHZMsl1N

---

Title: IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

Authors: Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang

Abstract: In-context learning allows adapting a model to new tasks given a task description at test time. In this paper, we present IMProv - a generative model that is able to in-context learn visual tasks from multimodal prompts. Given a textual description of a visual task (e.g. “Left: input image, Right: foreground segmentation”), a few input-output visual examples, or both, the model in-context learns to solve it for a new test input. We train a masked generative transformer on a new dataset of figures from computer vision papers and their associated captions, together with a captioned large-scale image-text dataset. During inference time, we prompt the model with text and/or image task example(s) and have the model inpaint the corresponding output. We show that training our model with text conditioning and scaling the dataset size improves in-context learning for computer vision tasks by over $+10\%$ AP for Foreground Segmentation, over $+5\%$ gains in AP for Single Object Detection, and almost $20\%$ lower LPIPS in Colorization. Our emperical results suggest that vision and language prompts are complementary and it is advantageous to use both to achieve better in-context learning performance.

URL: https://openreview.net/forum?id=qBTgnk2HAf

---

Title: Adversarial Attacks on Online Learning to Rank with Stochastic Click Models

Authors: Zichen Wang, Rishab Balasubramanian, Hui Yuan, chenyu song, Mengdi Wang, Huazheng Wang

Abstract: We propose the first study of adversarial attacks on online learning to rank. The goal of the attacker it to misguide the online learning to rank algorithm to place the target item on top of the ranking list linear times to time horizon $T$ with a sublinear attack cost. We propose generalized list poisoning attacks that perturb the ranking list presented to the user. This strategy can efficiently attack any no-regret ranker in general stochastic click models. Furthermore, we propose a click poisoning-based strategy named attack-then-quit that can efficiently attack two representative OLTR algorithms for stochastic click models. We theoretically analyze the success and cost upper bound of the two proposed methods. Experimental results based on synthetic and real-world data further validate the effectiveness and cost-efficiency of the proposed attack strategies.

URL: https://openreview.net/forum?id=BKwGowR0Bt

---

Title: Tweedie Moment Projected Diffusions for Inverse Problems

Authors: Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, Omer Deniz Akyildiz

Abstract: Diffusion generative models unlock new possibilities for inverse problems as they allow for the incorporation of strong empirical priors into the process of scientific inference. Recently, diffusion models are repurposed for solving inverse problems using Gaussian approximations to conditional densities of the reverse process via Tweedie’s formula to parameterise the mean, complemented with various heuristics. To address various challenges arising from these approximations, we leverage higher order information using Tweedie’s formula and obtain a statistically principled approximation. We further provide a theoretical guarantee specifically for posterior sampling which can lead to better theoretical understanding of diffusion-based conditional sampling. Finally, we illustrate the empirical effectiveness of our approach for general linear inverse problems on toy synthetic examples as well as image restoration. We show that our method (i) removes any time-dependent step-size hyperparameters required by earlier methods, (ii) brings stability and better sample quality across multiple noise levels, (iii) is the only method that works in a stable way with variance exploding (VE) forward processes as opposed to earlier works.

URL: https://openreview.net/forum?id=4unJi0qrTE

---

Title: Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models

Authors: Sumeet Singh, Stephen Tu, Vikas Sindhwani

Abstract: A crucial design decision for any robot learning pipeline is the choice of policy representation: what type of model should be used to generate the next set of robot actions? Owing to the inherent multi-modal nature of many robotic tasks, combined with the recent successes in generative modeling, researchers have turned to state-of-the-art probabilistic models such as diffusion models for policy representation. In this work, we revisit the choice of energy-based models (EBM) as a policy class.

We show that the prevailing folklore---that energy models in high dimensional continuous spaces are impractical to train---is false. We develop a practical training objective and algorithm for energy models which combines several key ingredients: (i) ranking noise contrastive estimation (R-NCE), (ii) learnable negative samplers, and (iii) non-adversarial joint training. We prove that our proposed objective function is asymptotically consistent and quantify its limiting variance. On the other hand, we show that the Implicit Behavior Cloning (IBC) objective is actually biased even at the population level, providing a mathematical explanation for the poor performance of IBC trained energy policies in several independent follow-up works. We further extend our algorithm to learn a continuous stochastic process that bridges noise and data, modeling this process with a family of EBMs indexed by scale variable. In doing so, we demonstrate that the core idea behind recent progress in generative modeling is actually compatible with EBMs. Altogether, our proposed training algorithms enable us to train energy-based models as policies which compete with---and even outperform---diffusion models and other state-of-the-art approaches in several challenging multi-modal benchmarks: obstacle avoidance path planning and contact-rich block pushing.

URL: https://openreview.net/forum?id=JmKAYb7I00

---

Title: Non-backtracking Graph Neural Networks

Authors: Seonghyun Park, Narae Ryu, Gahee Kim, Dongyeop Woo, Se-Young Yun, Sungsoo Ahn

Abstract: The celebrated message-passing updates for graph neural networks allow representing large-scale graphs with local and computationally tractable updates. However, the updates suffer from backtracking, i.e., a message flowing through the same edge twice and revisiting the previously visited node. Since the number of message flows increases exponentially with the number of updates, the redundancy in local updates prevents the graph neural network from accurately recognizing a particular message flow relevant for downstream tasks. In this work, we propose to resolve such a redundancy issue via the non-backtracking graph neural network (NBA-GNN) that updates a message without incorporating the message from the previously visited node. We theoretically investigate how NBA-GNN alleviates the over-squashing of GNNs, and establish a connection between NBA-GNN and the impressive performance of non-backtracking updates for stochastic block model recovery. Furthermore, we empirically verify the effectiveness of our NBA-GNN on the long-range graph benchmark and transductive node classification problems.

URL: https://openreview.net/forum?id=64HdQKnyTc

---

Title: Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections

Authors: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Rasa Hosseinzadeh, Anthony L. Caterini, Jesse C. Cresswell

Abstract: In recent years there has been increased interest in understanding the interplay between deep generative models (DGMs) and the manifold hypothesis. Research in this area focuses on understanding the reasons why commonly-used DGMs succeed or fail at learning distributions supported on unknown low-dimensional manifolds, as well as developing new models explicitly designed to account for manifold-supported data. This manifold lens provides both clarity as to why some DGMs (e.g. diffusion models and some generative adversarial networks) empirically surpass others (e.g. likelihood-based models such as variational autoencoders, normalizing flows, or energy-based models) at sample generation, and guidance for devising more performant DGMs. We carry out the first survey of DGMs viewed through this lens, making two novel contributions along the way. First, we formally establish that numerical instability of likelihoods in high ambient dimensions is unavoidable when modelling data with low intrinsic dimension. We then show that DGMs on learned representations of autoencoders can be interpreted as approximately minimizing Wasserstein distance: this result, which applies to latent diffusion models, helps justify their outstanding empirical results. The manifold lens provides a rich perspective from which to understand DGMs, and we aim to make this perspective more accessible and widespread.

URL: https://openreview.net/forum?id=a90WpmSi0I

---

Title: Graph Cuts with Arbitrary Size Constraints Through Optimal Transport

Authors: Chakib Fettal, lazhar labiod, Mohamed Nadif

Abstract: A common way of partitioning graphs is through minimum cuts. One drawback of classical minimum cut methods is that they tend to produce small groups, which is why more balanced variants such as normalized and ratio cuts have seen more success. However, we believe that with these variants, the balance constraints can be too restrictive for some applications like for clustering of imbalanced datasets, while not being restrictive enough for when searching for perfectly balanced partitions. Here, we propose a new graph cut algorithm for partitioning graphs under arbitrary size constraints. We formulate the graph cut problem as a Gromov-Wasserstein with a concave regularizer problem. We then propose to solve it using an accelerated proximal GD algorithm which guarantees global convergence to a critical point, results in sparse solutions and only incurs an additional ratio of $\mathcal{O}(\log(n))$ compared to the classical spectral clustering algorithm but was seen to be more efficient.

URL: https://openreview.net/forum?id=UG7rtrsuaT

---

Title: Reward Poisoning on Federated Reinforcement Learning

Authors: Evelyn Ma, Praneet Rathi, S. Rasoul Etesami

Abstract: Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. Despite the advantage FL brings to RL, Federated Reinforcement Learning (FRL) is inherently susceptible to poisoning, as both FL and RL are vulnerable to such training-time attacks; however, the vulnerability of FRL has not been well-studied before. In this work, we propose a general framework to characterize FRL poisoning as an optimization problem and design a poisoning protocol that can be applied to policy-based FRL. Our framework is versatile, catering to FRL scenarios employing both policy-gradient local RL and actor-critic local RL. In the context of actor-critic configurations, we conduct training for a pair of critics, one private and one public, aimed at maximizing the potency of poisoning. We provably show that our method can strictly hurt the global objective. We verify the effectiveness of our poisoning approach through comprehensive experiments, supported by mainstream RL algorithms, across various RL OpenAI Gym environments covering a wide range of difficulty levels. Within these experiments, we assess our proposed attack by comparing it to various baselines, including standard, poisoned, and robust FRL methods. The results demonstrate the power of the proposed protocol in effectively poisoning FRL systems – It consistently diminishes performance across diverse environments, proving to be more effective than baseline methods. Our work provides new insights into the training-time vulnerability of FL in RL and poses new challenges for designing secure FRL algorithms.

URL: https://openreview.net/forum?id=h2jpFufyG4

---

Title: Variational Inference on the Final-Layer Output of Neural Networks

Authors: Yadi Wei, Roni Khardon

Abstract: Traditional neural networks are simple to train but they typically produce overconfident predictions. In contrast, Bayesian neural networks provide good uncertainty quantification but optimizing them is time consuming due to the large parameter space. This paper proposes to combine the advantages of both approaches by performing Variational Inference in the Final layer Output space (VIFO), because the output space is much smaller than the parameter space. We use neural networks to learn the mean and the variance of the probabilistic output. Using the Bayesian formulation we incorporate collapsed variational inference with VIFO which significantly improves the performance in practice. On the other hand, like standard, non-Bayesian models, VIFO enjoys simple training and one can use Rademacher complexity to provide risk bounds for the model. Experiments show that VIFO provides a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.

URL: https://openreview.net/forum?id=mTOzXLmLKr

---

Title: Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives

Authors: Marcel Hirt, Domenico Campolo, Victoria Leong, Juan-Pablo Ortega

Abstract: Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations that jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. To encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational objective that can tightly approximate the data log-likelihood. We develop more flexible aggregation schemes that avoid the inductive biases in PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks. Our numerical experiments illustrate trade-offs for multi-modal variational objectives and various aggregation schemes. We show that our variational objective and more flexible aggregation models can become beneficial when one wants to approximate the true joint distribution over observed modalities and latent variables in identifiable models.

URL: https://openreview.net/forum?id=lM4nHnxGfL

---

Title: Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

Authors: Hongyang R. Zhang, Dongyue Li, Haotian Ju

Abstract: The training of over-parameterized neural networks has received much study in recent literature. An important consideration is the regularization of over-parameterized networks due to their highly nonconvex and nonlinear geometry. In this paper, we study noise injection algorithms, which can regularize the Hessian of the loss, leading to regions with flat loss surfaces. Specifically, by injecting isotropic Gaussian noise into the weight matrices of a neural network, we can obtain an approximately unbiased estimate of the trace of the Hessian. However, naively implementing the noise injection via adding noise to the weight matrices before backpropagation presents limited empirical improvements. To address this limitation, we design a two-point estimate of the Hessian penalty, which injects noise into the weight matrices along both positive and negative directions of the random noise. In particular, this two-point estimate eliminates the variance of the first-order Taylor's expansion term on the Hessian. We show a PAC-Bayes generalization bound that depends on the trace of the Hessian (and the radius of the weight space), which can be measured from data.

We conduct a detailed experimental study to validate our approach and show that it can effectively regularize the Hessian and improve generalization. First, our algorithm can outperform prior approaches on sharpness-reduced training, delivering up to a 2.4% test accuracy increase for fine-tuning ResNets on six image classification datasets. Moreover, the trace of the Hessian reduces by 15.8%, and the largest eigenvalue is reduced by 9.7% with our approach. We also find that the regularization of the Hessian can be combined with alternative regularization methods, such as weight decay and data augmentation, leading to stronger regularization. Second, our approach remains highly effective for improving generalization in pretraining multimodal CLIP models and chain-of-thought fine-tuning.

URL: https://openreview.net/forum?id=yfrNkb2Ldd

---

Title: Conservative Evaluation of Offline Policy Learning

Authors: Hager Radi Abdelwahed, Josiah P. Hanna, Matthew E. Taylor

Abstract: The world offers unprecedented amounts of data in real-world domains, from which we can develop successful decision-making systems. It is possible for reinforcement learning (RL) to learn control policies offline from such data but challenging to deploy an agent during learning in safety-critical domains. Offline RL learns from historical data without access to an environment. Therefore, we need a methodology for estimating how a newly-learned agent will perform when deployed in the real environment \emph{before} actually deploying it. To achieve this, we propose a framework for conservative evaluation of offline policy learning (CEOPL). We focus on being conservative so that the probability that our agent performs below a baseline is approximately $\delta$, where $\delta$ specifies how much risk we are willing to accept. In our setting, we assume access to a data stream, split into a train-set to learn an offline policy, and a test-set to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrap confidence intervals. A lower-bound estimate allows us to decide when to deploy our learned policy with minimal risk of overestimation. We demonstrate CEOPL on a range of tasks as well as real-world medical data.

URL: https://openreview.net/forum?id=kLo4TKh0OP

---

Title: Decomposition of Equivariant Maps via Invariant Maps: Application to Universal Approximation under Symmetry.

Authors: Akiyoshi Sannai, Yuuki Takai, Matthieu Cordonnier

Abstract: In this paper, we develop a theory about the relationship between invariant and equivariant maps with regard to a group $G$. We then leverage this theory in the context of deep neural networks with group symmetries in order to obtain novel insight into their mechanisms. More precisely, we establish a one-to-one relationship between equivariant maps and certain invariant maps. This allows us to reduce arguments for equivariant maps to those for invariant maps and vice versa. As an application, we propose a construction of universal equivariant architectures built from universal invariant networks. We, in turn, explain how the universal architectures arising from our construction differ from standard equivariant architectures known to be universal. Furthermore, we explore the complexity, in terms of the number of free parameters, of our models, and discuss the relation between invariant and equivariant networks' complexity. Finally, we also give an approximation rate for $G$-equivariant deep neural networks with ReLU activation functions for finite group $G$.

URL: https://openreview.net/forum?id=ycOLyHh1Ue

---

Title: Threshold Moving for Online Class Imbalance Learning with Dynamic Evolutionary Cost Vector

Authors: Peijia Qin, Shuxian Li, Xiaoqun Liu, Zubin Zheng, Siang Yew Chong

Abstract: Existing online class imbalance learning methods fail to achieve optimal performance because their assumptions about enhancing minority classes are hard-coded in model parameters. To learn the model for the performance measure directly instead of using heuristics, we introduce a novel framework based on a dynamic EA called Online Evolutionary Cost Vector (OECV). By bringing the threshold moving method from the cost-sensitive learning paradigm and viewing the cost vector as a hyperparameter, our method transforms the online class imbalance issue into a bi-level optimization problem. The lower layer utilizes a base online classifier for rough prediction, and the upper layer refines the prediction using a threshold moving cost vector learned via a dynamic evolutionary algorithm (EA). OECV benefits from both the efficiency of online learning methods and the high performance of EA, as demonstrated in empirical studies against state-of-the-art methods on thirty datasets. Additionally, we show the effectiveness of the EA component in the ablation study by comparing OECV to its two variants, OECV-n and OECV-ea, respectively. This work reveals the superiority of incorporating EA into online imbalance classification tasks, while its potential extends beyond the scope of the class imbalance setting and warrants future research attention. We release our code for future research.

URL: https://openreview.net/forum?id=EIPnUofed9

---

Title: Continual Adaptation of Vision Transformers for Federated Learning

Authors: Shaunak Halbe, James Seale Smith, Junjiao Tian, Zsolt Kira

Abstract: In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges from both the Continual and Federated Learning perspectives. Specifically, models trained in a CFL setup suffer from catastrophic forgetting which is exacerbated by data heterogeneity across clients. Existing attempts at this problem tend to impose large overheads on clients and communication channels or require access to stored data which renders them unsuitable for real-world use due to privacy. In this paper, we attempt to tackle forgetting and heterogeneity while minimizing overhead costs and without requiring access to any stored data. We study this problem in the context of Vision Transformers and explore parameter-efficient approaches to adapt to dynamic distributions while minimizing forgetting. We achieve this by leveraging a prompting based approach (such that only prompts and classifier heads have to be communicated) and proposing a novel and lightweight generation and distillation scheme to consolidate client models at the server. We formulate this problem for image classification and establish strong baselines for comparison, conduct experiments on CIFAR-100 as well as challenging, large-scale datasets like ImageNet-R and DomainNet. Our approach outperforms both existing methods and our own baselines by as much as 7% while significantly reducing communication and client-level computation costs. Code available at https://github.com/shaunak27/hepco-fed.

URL: https://openreview.net/forum?id=vsZ5A3Zxyr

---

Title: On the Equivalence of Graph Convolution and Mixup

Authors: Xiaotian Han, Hanqing Zeng, Yu Chen, Shaoliang Nie, Jingzhou Liu, Kanika Narang, Zahra Shakeri, Karthik Abinav Sankararaman, Song Jiang, Madian Khabsa, Qifan Wang, Xia Hu

Abstract: This paper investigates the relationship between graph convolution and Mixup techniques. Graph convolution in a graph neural network involves aggregating features from neighboring samples to learn representative features for a specific node or sample. On the other hand, Mixup is a data augmentation technique that generates new examples by averaging features and one-hot labels from multiple samples. One commonality between these techniques is their utilization of information from multiple samples to derive feature representation. This study aims to explore whether a connection exists between the two. Our investigation reveals that, under two mild modifications, graph convolution can be viewed as a specialized form of Mixup that is applied during both the training and testing phases. The two modifications are 1) \textit{Homophily Relabel} - assigning the target node's label to all its neighbors, and 2) \textit{Test-Time Mixup} - Mixup the feature during the test time. We establish this equivalence mathematically by demonstrating that graph convolution networks and simplified graph convolution can be expressed as a form of Mixup. We also empirically verify the equivalence by training an MLP using the two modifications to achieve comparable performance.

URL: https://openreview.net/forum?id=koC6zyaj73

---


New submissions
===============


Title: Masked Autoencoders are PDE Learners

Abstract: Neural solvers for partial differential equations (PDEs) have great potential to generate fast and accurate physics solutions, yet their practicality is currently limited by their generalizability. PDEs evolve over broad scales and exhibit diverse behaviors; predicting these phenomena will require learning representations across a wide variety of inputs which may encompass different coefficients, boundary conditions, resolutions, or even equations. As a step towards generalizable PDE modeling, we adapt masked pretraining for physics problems. Through self-supervised learning across PDEs, masked autoencoders can consolidate heterogeneous physics to learn rich latent representations. We show that learned representations can generalize to unseen equations or parameters and are semantically meaningful by performing latent PDE arithmetic. Furthermore, we demonstrate that masked pretraining can improve PDE coefficient regression and the classification of PDE features. Lastly, conditioning neural solvers on learned latent representations can improve time-stepping and super-resolution performance across a variety of coefficients, discretizations, or boundary conditions, as well as on unseen PDEs. We hope that masked pretraining can emerge as a unifying method across large, unlabeled, and heterogeneous datasets to learn latent physics at scale.

URL: https://openreview.net/forum?id=rZNuiFwXVs

---

Title: Gradient-guided discrete walk-jump sampling for biological sequence generation

Abstract: In this work, we propose gradient-guided discrete walk-jump sampling (gg-dWJS), a novel discrete sequence generation method for biological sequence optimization. Leveraging gradient guidance in the noisy manifold, we sample from the smoothed data manifold by applying discretized Markov chain Monte Carlo (MCMC) using a denoising model with the gradient-guidance from a discriminative model. This is followed by jumping to the discrete data manifold using a conditional one-step denoising. We showcase our method in two different
modalities: discrete image and antibody sequence generation tasks in the single objective and multi-objective setting. Through evaluation on these tasks, we show that our method generates high-quality samples that are well-optimized for specific tasks.

URL: https://openreview.net/forum?id=fFVuo4SPfT

---

Title: Sentiment Classification using Sentence Embeddings: Exploiting Sentence Transformer Loss Functions

Abstract: Evaluating customer sentiment plays a critical role in business success. By analyzing customer feedback, companies can swiftly identify expectations, areas for improvement, and pain points related to their products and services. Sentiment analysis, fueled by advances in natural language processing techniques, has become widely accepted for this purpose. In this study, we leverage the well-known “Twitter US Airline Sentiment” dataset to develop a sentence transformer architecture based on a pre-trained transformer model (mpnet-base). We fine-tune the model using appropriate loss functions to generate semantically rich sentence embeddings that are subsequently fed into gradient boosting-based machine learning algorithms. The resulting hybrid model achieves impressive sentiment prediction performance. Additionally, this study delves into the intricacies of various transformer loss functions that can be applied to fine-tune the sentence transformer model for enhanced sentiment classification performance. Our sentence transformer architecture, fine-tuned on CosineSimilarity loss function and combined with Light Gradient Boosting Machine Classifier, achieves an excellent accuracy of 86.5\%, while demonstrating high recall rates even for minority sentiment classes (74.4\% for neutral and 82.9\% for positive sentiment) without any data augmentation. Our study emphasizes that fine-tuned sentence transformer models can outperform existing techniques for sentiment classification, particularly in tri-class sentiment scenarios and they come with the inherent advantages of lesser computational load and higher scalability opportunity.

URL: https://openreview.net/forum?id=YO6qXW2co7

---

Title: Adaptive Physics-informed Neural Networks: A Survey

Abstract: Physics-informed neural networks (PINNs) have emerged as a promising approach for solving partial differential equations (PDEs) using neural networks, particularly in data-scarce scenarios due to their unsupervised training capability. However, a key limitation is the
need for re-optimization with each change in PDE parameters, similar to the challenge in traditional numerical methods where each system of equations corresponds to a specific PDE instance. This characteristic poses a barrier to the widespread adoption of PINNs
across scientific and engineering applications. This survey explores research addressing this limitation through transfer learning and meta-learning, synthesizing insights to establish a foundation for efficient data generation strategies tailored to PINNs. These methods can potentially improve PINNs’ training efficiency, enabling quicker adaptation to new PDEs with fewer data and computational demands. While numerical methods directly solve systems of equations to derive solutions, neural networks implicitly learn solutions by adjusting their parameters. One notable advantage of neural networks lies in their capacity to abstract away from specific problem domains, enabling them to retain, discard, or adapt learned representations to efficiently address similar problems. By understanding how these techniques can be applied to PINNs, this survey seeks to identify promising directions for future research to enable the widespread adoption of PINNs across a wide range of scientific and engineering applications.

URL: https://openreview.net/forum?id=vz5P1Kbt6t

---

Title: CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Abstract: AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce \texttt{CORE-Bench} (\textbf{Co}mputational \textbf{Re}producibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in \texttt{CORE-Bench} consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose \texttt{AutoGPT} and a task-specific agent called \texttt{CORE-Agent}. We tested both variants using two underlying language models: \texttt{GPT-4o} and \texttt{GPT-4o-mini}. The best agent achieved an accuracy of 21\% on the hardest level of tasks, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that \texttt{CORE-Bench} can improve the state of reproducibility and spur the development of future research agents.

URL: https://openreview.net/forum?id=BsMMc4MEGS

---

Title: Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling

Abstract: Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. While motorized bionic limbs show promise, their effectiveness depends on replicating the dynamic coordination of human movement across diverse environments. In this paper, we introduce a model for human behavior in the context of bionic prosthesis control. Our approach leverages motion capture and wearable sensor data to learn the synergistic coupling of the lower limbs during locomotion, enabling the prediction of the kinematic behavior of a missing limb during tasks such as walking, climbing inclines, and stairs. We propose a multitasking, continually adaptive model that anticipates and refines movements over time. At the core of our method is a technique called "multitask prospective rehearsal," that anticipates and synthesizes future movements based on the previous prediction and employs a corrective mechanism for subsequent predictions. Our evolving architecture merges lightweight, task-specific modules on a shared backbone, ensuring both specificity and scalability. We validate our model through experiments on real-world human gait datasets, including transtibial amputees, across a wide range of locomotion tasks. Results demonstrate that our approach consistently outperforms baseline models, particularly in scenarios with distributional shifts, adversarial perturbations, and noise.

URL: https://openreview.net/forum?id=Bmy82p2eez

---

Title: In-distribution adversarial attacks on object recognition models using gradient-free search.

Abstract: Neural networks are susceptible to small perturbations in the form of 2D rotations and shifts, image crops, and even changes in object colors. Past works attribute these errors to dataset bias, claiming that models fail on these perturbed samples as they do not belong to the training data distribution. Here, we challenge this claim and present evidence of the widespread existence of perturbed images within the training data distribution, which networks fail to classify. We train models on data sampled from parametric distributions, then search inside this data distribution to find such in-distribution adversarial examples. This is done using our gradient-free evolution strategies (ES) based approach which we call CMA-Search. Despite training with a large-scale (0.5 million images), unbiased dataset of camera and light variations, CMA-Search can find a failure inside the data distribution in over 71% cases by perturbing the camera position. With lighting changes, CMA-Search finds misclassifications in 42% cases. These findings also extend to natural images from ImageNet and Co3D datasets. This phenomenon of in-distribution images presents a highly worrisome problem for artificial intelligence---they bypass the need for a malicious agent to add engineered noise to induce an adversarial attack. All code, datasets, and demos are available at https://github.com/in-dist-adversarials/in_distribution_adversarial_examples.

URL: https://openreview.net/forum?id=uF9ZdAwrCT

---

Title: Personalized Privacy Amplification via Importance Sampling

Abstract: For scalable machine learning on large data sets, subsampling a representative subset is a common approach for efficient model training.
This is often achieved through importance sampling, whereby informative data points are sampled more frequently.
In this paper, we examine the privacy properties of importance sampling, focusing on an individualized privacy analysis.
We find that, in importance sampling, privacy is well aligned with utility but at odds with sample size.
Based on this insight, we propose two approaches for constructing sampling distributions: one that optimizes the privacy-efficiency trade-off; and one based on a utility guarantee in the form of coresets.
We evaluate both approaches empirically in terms of privacy, efficiency, and accuracy on the differentially private $k$-means problem.
We observe that both approaches yield similar outcomes and consistently outperform uniform sampling across a wide range of data sets.

URL: https://openreview.net/forum?id=IK2cR89z45

---

Title: LVM-Lite: Training Large Vision Models with Efficient Sequential Modeling

Abstract: Generative pre-training has significantly advanced natural language understanding. Building upon this success, recent research begins to innovate Large Vision Models (LVM) by leveraging large-scale pre-training on visual sequences, where simultaneous consideration of image token sequences within single images and across a set of images is of key importance. This paper shows that sequential modeling on single images and across multiple images can be efficiently and effectively decoupled. We introduce a two-stage learning pipeline, starting with single-image pre-training, followed by fine-tuning on long image/video sequences. We term this method Large Vision Model Lite (LVM-Lite). Extensive experiments showcase the impressive performance of LVM-Lite across various generative and discriminative benchmarks, comparable to specifically trained models without the need for task-specific training. Importantly, LVM-Lite accelerates training speed substantially up to $2.7\times$ and demonstrates strong scalability.

URL: https://openreview.net/forum?id=J05un5MCT0

---

Title: Reconciling Kaplan and Chinchilla Scaling Laws

Abstract: Kaplan and Chinchilla studied the scaling behavior of transformers trained on next-token language prediction. These studies produced different estimates for how the number of parameters ($N$) and training tokens ($D$) should be set to achieve the lowest possible loss for a given compute budget ($C$). Kaplan: $N_\text{optimal} \propto C^{0.73}$, Chinchilla: $N_\text{optimal} \propto C^{0.50}$. This paper finds that much of this discrepancy can be attributed to Kaplan counting non-embedding rather than total parameters, combined with their analysis being performed at small scale. Simulating the Chinchilla study under these conditions produces biased scaling coefficients close to Kaplan's. Hence, this paper reaffirms Chinchilla's scaling coefficients, by explaining the primary cause of Kaplan's original overestimation. As a second contribution, the paper explains differences in the reported relationships between loss and compute. These findings lead us to recommend that future scaling studies use total parameters and compute.

URL: https://openreview.net/forum?id=NLoaLyuUUF

---

Title: PAC Privacy Preserving Diffusion Models

Abstract: Data privacy protection is garnering increased attention among researchers. Diffusion models (DMs), particularly with strict differential privacy, can potentially produce images with both high privacy and visual quality. However, challenges arise such as in ensuring robust protection in privatizing specific data attributes, areas where current models often fall short. To address these challenges, we introduce the PAC Privacy Preserving Diffusion Model, a model leverages diffusion principles and ensure Probably Approximately Correct (PAC) privacy. We enhance privacy protection by integrating a private classifier guidance into the Langevin Sampling Process. Additionally, recognizing the gap in measuring the privacy of models, we have developed a novel metric to gauge privacy levels. Our model, assessed with this new metric and supported by Gaussian matrix computations for the PAC bound, has shown superior performance in privacy protection over existing leading private generative models according to benchmark tests.

URL: https://openreview.net/forum?id=uDrayegpOx

---

Title: Compositional Instruction Following with Language Models and Reinforcement Learning

Abstract: Combining reinforcement learning with language grounding is challenging as the agent needs to explore the environment while simultaneously learning multiple language-conditioned tasks. To address this, we introduce a novel method: the compositionally-enabled reinforcement learning language agent (CERLLA). Our method reduces the sample complexity of tasks specified with language by leveraging compositional policy representations and a semantic parser trained using reinforcement learning and in-context learning. We evaluate our approach in an environment requiring function approximation and demonstrate compositional generalization to novel tasks. Our method significantly outperforms the previous best non-compositional baseline in terms of sample complexity on 162 tasks designed to test compositional generalization. Our model attains a higher success rate and learns in fewer steps than the non-compositional baseline. It reaches a success rate equal to an oracle policy's upper-bound performance of 92%. With the same number of environment steps, the baseline only reaches a success rate of 80%.

URL: https://openreview.net/forum?id=pR3fCmztDf

---

Title: Maximum Mean Discrepancy on Exponential Windows for Online Change Detection

Abstract: Detecting changes is of fundamental importance when analyzing data streams and has many applications, e.g., in predictive maintenance, fraud detection, or medicine. A principled approach to detect changes is to compare the distributions of observations within the stream to each other via hypothesis testing. Maximum mean discrepancy (MMD), a (semi-)metric on the space of probability distributions, provides powerful non-parametric two-sample tests on kernel-enriched domains. In particular, MMD is able to detect any disparity between distributions under mild conditions. However, classical MMD estimators suffer from a quadratic runtime complexity, which renders their direct use for change detection in data streams impractical. In this article, we propose a new change detection algorithm, called Maximum Mean Discrepancy on Exponential Windows (MMDEW), that combines the benefits of MMD with an efficient computation based on exponential windows. We prove that MMDEW enjoys polylogarithmic runtime and logarithmic memory complexity and show empirically that it outperforms the state of the art on benchmark data streams.

URL: https://openreview.net/forum?id=OGaTF9iOxi

---

Title: Recent Advances in Attack and Defense Approaches of Large Language Models

Abstract: Large Language Models (LLMs) have revolutionized artificial intelligence and machine learning through their advanced text processing and generating capabilities. However, their widespread deployment has raised significant safety and reliability concerns. Established vulnerabilities in deep neural networks, coupled with emerging threat models, may compromise security evaluations and create a false sense of security. Given the extensive research in the field of LLM security, we believe that summarizing the current state of affairs will help the research community better understand the present landscape and inform future developments. This paper reviews current research on LLM vulnerabilities and threats, and evaluates the effectiveness of contemporary defense mechanisms. We analyze recent studies on attack vectors and model weaknesses, providing insights into attack mechanisms and the evolving threat landscape. We also examine current defense strategies, highlighting their strengths and limitations. By contrasting advancements in attack and defense methodologies, we identify research gaps and propose future directions to enhance LLM security. Our goal is to advance the understanding of LLM safety challenges and guide the development of more robust security measures.

URL: https://openreview.net/forum?id=LG4TjXvUvR

---

Title: Evaluating General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks

Abstract: The integration of deep learning systems into healthcare has been hindered by the resource-intensive process of data annotation and the inability of these systems to generalize to different data distributions. Foundation models, which are models pre-trained on large datasets, have emerged as a solution to reduce reliance on annotated data and enhance model generalizability and robustness. DINOv2 is an open-source foundation model pre-trained with self-supervised learning on 142 million curated natural images that exhibits promising capabilities across various vision tasks. Nevertheless, a critical question remains unanswered regarding DINOv2's adaptability to radiological imaging, and whether its features are sufficiently general to benefit radiology image analysis. Therefore, this study comprehensively evaluates the performance DINOv2 for radiology, conducting over 200 evaluations across diverse modalities (X-ray, CT, and MRI). To measure the effectiveness and generalizability of DINOv2's feature representations, we analyze the model across medical image analysis tasks including disease classification and organ segmentation on both 2D and 3D images, and under different settings like kNN, few-shot learning, linear-probing, end-to-end fine-tuning, and parameter-efficient fine-tuning. Comparative analyses with established supervised, self-supervised, and weakly-supervised models reveal DINOv2's superior performance and cross-task generalizability. The findings contribute insights to potential avenues for optimizing pre-training strategies for medical imaging and enhancing the broader understanding of DINOv2's role in bridging the gap between natural and radiological image analysis.

URL: https://openreview.net/forum?id=WQyFFhbFfY

---

Title: BM$^2$: Coupled Schrödinger Bridge Matching

Abstract: A Schrödinger bridge establishes a dynamic transport map between two target distributions via a reference process, simultaneously solving an associated entropic optimal transport problem.
We consider the setting where samples from the target distributions are available, and the reference diffusion process admits tractable dynamics.
We thus introduce Coupled Bridge Matching (BM$^2$), a simple \emph{non-iterative} approach for learning Schrödinger bridges with neural networks.
A preliminary theoretical analysis of the convergence properties of BM$^2$ is carried out, supported by numerical experiments that demonstrate the effectiveness of our proposal.

URL: https://openreview.net/forum?id=fqkq1MgONB

---

Title: Explanation Faithfulness is Alignment: A Unifying and Geometric Perspective on Interpretability Evaluation

Abstract: Interpretability researchers face a universal question: without access to ground truth ex-planation labels, how can the faithfulness of an explanation to its model be determined? Despite immense efforts to develop new evaluation methods, current approaches remain in a pre-paradigmatic state: fragmented, difficult to calibrate, and lacking cohesive theoretical grounding. Observing the lack of a unifying theory, we propose a Generalised Explanation Faithfulness (GEF) evaluative criterion centred on alignment that combines existing perturbation-based evaluations, eliminating the need for singular, task-specific evaluations. Complementing this unifying perspective, from a geometric point of view, we reveal a prevalent yet critical oversight in current evaluation practice: the failure to account for the learned geometry and non-linear mapping present in the model and explanation spaces. To solve this, we propose a general-purpose, threshold-free faithfulness evaluator that incorporates principles from differential geometry, facilitating evaluation agnostically across tasks and explanation approaches. Through extensive cross-domain benchmarks on natural language processing, vision, and tabular tasks, we provide first-of-its-kind insights into the comparative performance of local linear approximations and global feature visualisation methods, and the faithfulness of large language models (LLMs) as post-hoc explainers. Our contributions are of substantial importance to the interpretability community, offering a principled, unified approach to evaluate the faithfulness of explanations. Code is available at url.

URL: https://openreview.net/forum?id=ukLxqA8zXj

---

Title: Statistical Mechanics of Min-Max Problems

Abstract: Min-max optimization problems, also known as saddle point problems, have attracted significant attention due to their applications in various fields, such as fair beamforming, generative adversarial networks (GANs), and adversarial learning. However, understanding the properties of these min-max problems has remained a substantial challenge. This study introduces a statistical mechanical formalism for analyzing the equilibrium values of min-max problems in the high-dimensional limit, while appropriately addressing the order of operations for min and max. As a first step, we apply this formalism to bilinear min-max games and simple GANs, deriving the relationship between the amount of training data and generalization error and indicating the optimal ratio of fake to real data for effective learning. This formalism provides a groundwork for a deeper theoretical analysis of the equilibrium properties in various machine learning methods based on min-max problems and encourages the development of new algorithms and architectures.

URL: https://openreview.net/forum?id=qZqUFeTtuI

---

Title: Wasserstein Coreset via Sinkhorn Loss

Abstract: Coreset selection, a technique for compressing large datasets while preserving performance, is crucial for modern machine learning. This paper presents a novel method for generating high-quality Wasserstein coresets using the Sinkhorn loss, a powerful tool with computational advantages. However, existing approaches suffer from numerical instability in Sinkhorn's algorithm. We address this by proposing stable algorithms for both forward and backward computations. We further derive an analytical formula for the Sinkhorn loss derivative and rigorously analyze the stability of our method. Extensive experiments demonstrate that our approach significantly outperforms existing methods in terms of sample selection quality, computational efficiency, and achieving a smaller Wasserstein distance.

URL: https://openreview.net/forum?id=DrMCDS88IL

---

Title: Pre-trained Vision-Language Models Learn Discoverable Visual Concepts

Abstract: Do vision-language models (VLMs) pre-trained to caption an image of a durian learn visual
concepts such as brown (color) and spiky (texture) at the same time? We aim to answer
this question as visual concepts learned “for free” would enable wide applications such as
neuro-symbolic reasoning or human-interpretable object classification. We assume that the
visual concepts, if captured by pre-trained VLMs, can be extracted by their vision-language
interface with text-based concept prompts. We observe that recent works prompting VLMs
with concepts often differ in their strategies to define and evaluate the visual concepts,
leading to conflicting conclusions. We propose a new concept definition strategy based on
two observations: First, certain concept prompts include shortcuts that recognize correct
concepts for wrong reasons; Second, multimodal information (e.g. visual discriminativeness,
and textual knowledge) should be leveraged when selecting the concepts. Our proposed
concept discovery and learning (CDL) framework is thus designed to identify a diverse list
of generic visual concepts (e.g. spiky as opposed to spiky durian), which are ranked and
selected based on visual and language mutual information. We carefully design quantitative
and human evaluations of the discovered concepts on nine diverse visual recognition datasets,
which confirm that pre-trained VLMs do learn visual concepts that provide accurate and
thorough descriptions for the recognized objects. All code and models are publicly releas

URL: https://openreview.net/forum?id=Vq0wMFBjo2

---

Title: Noise-free Loss Gradients: A Surprisingly Effective Baseline for Coreset Selection

Abstract: The exponential rise in size and complexity of deep learning models and datasets have resulted in a considerable demand for computational resources. Coreset selection is one of the methods to alleviate this rising demand. The goal is to select a subset from a large dataset to train a model that performs almost at par with the one trained on the large dataset while reducing computational time and resource requirements. Existing approaches either attempt to identify remarkable samples (e.g., Forgetting, Adversarial Deepfool, EL2N, etc.) that stand out from the rest or solve complex optimization (e.g., submodular maximization, OMP) problems to compose the coresets. This paper proposes a novel and intuitive approach to efficiently select a coreset based on the similarity of loss gradients. Our method works on the hypothesis that gradients of samples belonging to a given class will point in similar directions during the early training phase. Samples with most neighbours that produce similar gradient directions, in other words, that produce noise-free gradients, will represent that class. Through extensive experimentation, we have demonstrated the effectiveness of our approach in out-performing state-of-the-art coreset selection algorithms on a range of benchmark datasets from CIFAR-10 to ImageNet with architectures of varied complexity (ResNet-18, ResNet-50, VGG-16, ViT). We have also demonstrated the effectiveness of our approach in Generative Modelling by implementing coreset selection to reduce execution time for various GAN models (DCGAN, MSGAN, SAGAN, SNGAN) for different datasets (CIFAR-10, CIFAR-100, Tiny ImageNet) while not impacting the performance metrics significantly.

URL: https://openreview.net/forum?id=OE4P1tW8iQ

---

Title: Making Self-supervised Learning Robust to Spurious Correlation via Learning-speed Aware Sampling

Abstract: Self-supervised learning (SSL) has emerged as a powerful technique for learning rich representations from unlabeled data. The data representations can capture many underlying attributes of data, and are useful in downstream prediction tasks. In real-world settings, spurious correlations between some attributes (e.g. race, gender and age) and labels for downstream tasks often exist, e.g. disease findings are usually more prevalent among elderly patients. In this paper, we investigate SSL in the presence of spurious correlations and show that the SSL training loss can be minimized by capturing only a subset of conspicuous features relevant to those sensitive attributes, despite the presence of other important predictive features for the downstream tasks. To address this issue, we investigate the learning dynamics of SSL and observe that the learning is slower for samples that conflict with such correlations (e.g. elder patients without diseases). Motivated by these findings, we propose a learning-speed aware SSL (LA-SSL) approach, in which we sample each training data with a probability that is inversely related to its learning speed. We evaluate LA-SSL on three datasets that exhibit spurious correlations between different attributes, demonstrating the enhanced robustness of pretrained representations on downstream classification tasks.

URL: https://openreview.net/forum?id=8mgX3Uw2Ea

---

Title: Characterizing the Convergence of Game Dynamics via Potentialness

Abstract: Understanding the convergence landscape of multi-agent learning is a fundamental problem of great practical relevance in many applications of artificial intelligence and machine learning. In general, it is well known that learning dynamics converge to Nash equilibrium in potential games - but, at the same time, many important classes of games do not admit a potential (exact or even ordinal), so this convergence does not have universal applicability. In an effort to measure how ``close'' a game is to being potential, we consider a distance function, that we call ``potentialness'', and which relies on a strategic decomposition of games introduced by Candogan et al. (2011). We introduce a numerical framework enabling the computation of this metric, which we use to calculate the degree of ``potentialness'' in a large class of generic matrix games, as well as in certain classes of games that have been well-studied in economics, but are known not to be generic - such as auctions and contests, which have become increasingly important due to the wide-spread automation of bidding and pricing with no-regret learning algorithms. We empirically show that potentialness decreases and concentrates with an increasing number of agents or actions; in addition, potentialness turns out to be a good predictor for the existence of pure Nash equilibria and the convergence of no-regret learning algorithms in matrix games. In particular, we observe that potentialness is very low for all-pay auctions and much higher for Tullock contests, first-, and second-price auctions, explaining the success of learning in the latter.

URL: https://openreview.net/forum?id=Is9APiPg4V

---

Title: B\'zier Flow: a Surface-wise Gradient Descent Method for Multi-objective Optimization

Abstract: This paper proposes a strategy to construct a multi-objective optimization algorithm from a single-objective optimization algorithm by using the B\'ezier simplex model. Additionally, we extend the stability of optimization algorithms in the sense of Probably Approximately Correct (PAC) learning and define the PAC stability. We prove that it leads to an upper bound on the generalization with high probability.
Furthermore, we show that multi-objective optimization algorithms derived from a gradient descent-based single-objective optimization algorithm are PAC stable. We conducted numerical experiments and demonstrated that our method achieved lower generalization errors than the existing multi-objective optimization algorithm.

URL: https://openreview.net/forum?id=I1gALvbRxj

---

Title: AutoDocSegmenter: A Geometric Approach towards Self-Supervised Document Segmentation

Abstract: Document segmentation, the process of dividing a document into coherent and significant regions, plays a crucial role for diverse applications that require parsing, retrieval, and categorization. However, most existing methods rely on supervised learning, which requires large-scale labeled datasets that are costly and time-consuming to obtain. In this work, we propose a novel self-supervised framework for document segmentation that does not require labeled data. Our framework consists of two components: (1) an unsupervised isothetic covers based pseudo mask generator which approximately segments document objects, and (2) an encoder-decoder network that learns to refine the pseudo masks and segments the document objects accurately. Our approach can handle diverse and intricate document layouts by leveraging the rich information from unlabeled datasets. We demonstrate the effectiveness of our approach on several benchmarks, where it outperforms state-of-the-art document segmentation methods.

URL: https://openreview.net/forum?id=JBveijn2OO

---

Title: Why is constrained neural language generation particularly challenging?

Abstract: Recent advances in deep neural language models combined with the capacity of large scale datasets have accelerated the development of natural language generation systems that produce fluent and coherent texts (to various degrees of success) in a multitude of tasks and application contexts. However, controlling the output of these models for desired user and task needs is still an open challenge. This is crucial not only to customizing the content and style of the generated language, but also to their safe and reliable deployment in the real world. We present an extensive survey on the emerging topic of constrained neural language generation in which we formally define and categorize the problems of natural language generation by distinguishing between conditions and constraints (the latter being testable conditions on the output text instead of the input), present constrained text generation tasks, and review existing methods and evaluation metrics for constrained text generation. Our aim is to highlight recent progress and trends in this emerging field, informing on the most promising directions and limitations towards advancing the state-of-the-art of constrained neural language generation research.

URL: https://openreview.net/forum?id=Vwgjk5ysWn

---

Title: Shapley Values of Structured Additive Regression Models and Application to RKHS Weightings of Functions

Abstract: The ability to interpret machine learning models is proving more and more invaluable, as their use in sensitive domains requires trust. Therefore, work to improve explanation methods, especially the interpretation of complex models, is of high importance. With this in mind, the purpose of this paper is twofold. First, we present an algorithm for efficiently calculating the Shapley values of a family of models, Structured Additive Regression (STAR) models, which allow more variable interactions than Generalized Additive Models (GAMs). Second, we present a new instantiation in the RKHS Weightings of Functions paradigm, better adapted to regression, and show how to transform it and other RKHS Weightings instantiations into STAR models. We therefore introduce a new family of STAR models, as well as the means to interpret their outputs in a timely manner.

URL: https://openreview.net/forum?id=aWRMvXTvPf

---

Reply all
Reply to author
Forward
0 new messages