Weekly TMLR digest for Jun 23, 2024

1 view

Skip to first unread message

TMLR

unread,

Jun 23, 2024, 12:00:11 AMJun 23

to tmlr-annou...@googlegroups.com

New certifications
==================

Featured Certification: Layerwise complexity-matched learning yields an improved model of cortical area V2

Nikhil Parthasarathy, Olivier J Henaff, Eero P Simoncelli

https://openreview.net/forum?id=lQBsLfAWhj

---

Reproducibility Certification: Reproducibility study of "Robust Fair Clustering: A Novel Fairness Attack and Defense Framework"

Lucas Ponticelli, Vincent Loos, Eren Kocadag, Kacper Bartosik

https://openreview.net/forum?id=Xu1sEPhjqH

---

Accepted papers
===============

Title: Koopman Spectrum Nonlinear Regulators and Efficient Online Learning

Authors: Motoya Ohnishi, Isao Ishikawa, Kendall Lowrey, Masahiro Ikeda, Sham M. Kakade, Yoshinobu Kawahara

Abstract: Most modern reinforcement learning algorithms optimize a cumulative single-step cost along a trajectory. The optimized motions are often ‘unnatural’, representing, for example, behaviors with sudden accelerations that waste energy and lack predictability. In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics. This induces a broader class of dynamical behaviors that evolve over stable manifolds such as nonlinear oscillators, closed loops, and smooth movements. We demonstrate that some dynamics characterizations that are not possible with a cumulative cost are feasible in this paradigm, which generalizes the classical eigenstructure and pole assignments to nonlinear decision making. Moreover, we present a sample efficient online learning algorithm for our problem that enjoys a sub-linear regret bound under some structural assumptions.

URL: https://openreview.net/forum?id=thfoUZugvS

---

Title: Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

Abstract: Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse %
is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and most existing work uses standard dense schedules and hyperparameters for training sparse networks. In this work, we examine the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We begin by showing that using standard dense training recipes for sparse training is suboptimal, and provide evidence that this results in *under-training*, loosely defined as using a suboptimal number of passes over the training data. We present training recipes for mitigating this issue for both sparse pre-training of vision models (e.g. ResNet50/ImageNet) and sparse fine-tuning of language models (e.g. BERT/GLUE), achieving state-of-the-art results in both settings in the high-sparsity regime, and providing detailed analyses for the difficulty of sparse training in both scenarios. Our work sets a new benchmark in terms of the accuracies that can be achieved under high sparsity, and should inspire further research into improving sparse model training, to reach higher accuracies under high sparsity, but also to do so efficiently.

URL: https://openreview.net/forum?id=vgthYeRBAF

---

Title: Learning Network Granger causality using Graph Prior Knowledge

Authors: Lucas Zoroddu, Pierre Humbert, Laurent Oudre

Abstract: Understanding the relationships among multiple entities through Granger causality graphs
within multivariate time series data is crucial across various domains, including economics,
finance, neurosciences, and genetics. Despite its broad utility, accurately estimating Granger
causality graphs in high-dimensional scenarios with few samples remains a persistent chal-
lenge. In response, this study introduces a novel model that leverages prior knowledge in
the form of a noisy undirected graph to facilitate the learning of Granger causality graphs,
while assuming sparsity. In this study we introduce an optimization problem, we propose
to solve it with an alternative minimization approach and we proved the convergence of
our fitting algorithm, highlighting its effectiveness. Furthermore, we present experimental
results derived from both synthetic and real-world datasets. These results clearly illustrate
the advantages of our proposed method over existing alternatives, particularly in situations
where few samples are available. By incorporating prior knowledge and emphasizing spar-
sity, our approach offers a promising solution to the complex problem of estimating Granger
causality graphs in high-dimensional, data-scarce environments.

URL: https://openreview.net/forum?id=DN6sut5fyR

---

Title: Best-of-Both-Worlds Linear Contextual Bandits

Authors: Masahiro Kato, Shinji Ito

Abstract: This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed context and then selects an arm based on the context and past observations. After selecting an arm, the decision-maker incurs a loss corresponding to the selected arm. The decision-maker aims to minimize the cumulative loss over the trial. The goal of this study is to develop a strategy that is effective in both stochastic and adversarial environments, with theoretical guarantees. We first formulate the problem by introducing a novel setting of bandits with adversarial corruption, referred to as the contextual adversarial regime with a self-bounding constraint. We assume linear models for the relationship between the loss and the context. Then, we propose a strategy that extends the {\tt RealLinExp3} by \citet{Neu2020} and the Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is shown to be upper-bounded by $O\left(\min\left\{\frac{(\log(T))^3}{\Delta_{*}} + \sqrt{\frac{C(\log(T))^3}{\Delta_{*}}},\ \ \sqrt{T}(\log(T))^2\right\}\right)$, where $T \in\mathbb{N}$ is the number of rounds, $\Delta_{*} > 0$ is the constant minimum gap between the best and suboptimal arms for any context, and $C\in[0, T] $ is an adversarial corruption parameter. This regret upper bound implies $O\left(\frac{(\log(T))^3}{\Delta_{*}}\right)$ in a stochastic environment and by $O\left( \sqrt{T}(\log(T))^2\right)$ in an adversarial environment. We refer to our strategy as the {\tt Best-of-Both-Worlds (BoBW) RealFTRL}, due to its theoretical guarantees in both stochastic and adversarial regimes.

URL: https://openreview.net/forum?id=aIG2RAtNuX

---

Title: Spike Accumulation Forwarding for Effective Training of Spiking Neural Networks

Authors: Ryuji Saiin, Tomoya Shirakawa, Sota Yoshihara, Yoshihide Sawada, Hiroyuki Kusumoto

Abstract: In this article, we propose a new paradigm for training spiking neural networks (SNNs), spike accumulation forwarding (SAF). It is known that SNNs are energy-efficient but difficult to train. Consequently, many researchers have proposed various methods to solve this problem, among which online training through time (OTTT) is a method that allows inferring at each time step while suppressing the memory cost. However, to compute efficiently on GPUs, OTTT requires operations with spike trains and weighted summation of spike trains during forwarding. In addition, OTTT has shown a relationship with the Spike Representation, an alternative training method, though theoretical agreement with Spike Representation has yet to be proven. Our proposed method can solve these problems; namely, SAF can halve the number of operations during the forward process, and it can be theoretically proven that SAF is consistent with the Spike Representation and OTTT, respectively. Furthermore, we confirmed the above contents through experiments and showed that it is possible to reduce memory and training time while maintaining accuracy.

URL: https://openreview.net/forum?id=RGQsUQDAd9

---

Title: Learning the essential in less than 2k additional weights - a simple approach to improve image classification stability under corruptions

Authors: Kai Bäuerle, Patrick Müller, Syed Muhammad Kazim, Ivo Ihrke, Margret Keuper

Abstract: The performance of image classification on well-known benchmarks such as ImageNet is remarkable, but in safety-critical situations, the accuracy often drops significantly under adverse conditions. To counteract these performance drops, we propose a very simple modification to the models: we pre-pend a single, dimension preserving convolutional layer with a large linear kernel whose purpose it is to extract the information that is essential for image classification. We show that our simple modification can increase the robustness against common corruptions significantly, especially for corruptions of high severity. We demonstrate the impact of our channel-specific layers on ImageNet-100 and ImageNette classification tasks and show an increase of up to 30% accuracy on corrupted data in the top1 accuracy. Further, we conduct a set of designed experiments to qualify the conditions for our findings. Our main result is that a data- and network-dependent linear subspace carries the most important classification information (the essential), which our proposed pre-processing layer approximately identifies for most corruptions, and at very low cost.

URL: https://openreview.net/forum?id=i2SuGWtIIm

---

Title: Training-free linear image inverses via flows

Authors: Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, Brian Karrer

Abstract: Solving inverse problems without any training involves using a pretrained generative model and making appropriate modifications to the generation process to avoid finetuning of the generative model. While recent methods have explored the use of diffusion models, they still require the manual tuning of many hyperparameters for different inverse problems. In this work, we propose a training-free method for solving linear inverse problems by using pretrained flow models, leveraging the simplicity and efficiency of Flow Matching models, using theoretically-justified weighting schemes, and thereby significantly reducing the amount of manual tuning. In particular, we draw inspiration from two main sources: adopting prior gradient correction methods to the flow regime, and a solver scheme based on conditional Optimal Transport paths. As pretrained diffusion models are widely accessible, we also show how to practically adapt diffusion models for our method. Empirically, our approach requires no problem-specific tuning across an extensive suite of noisy linear inverse problems on high-dimensional datasets, ImageNet-64/128 and AFHQ-256, and we observe that our flow-based method for solving inverse problems improves upon closely-related diffusion-based methods in most settings.

URL: https://openreview.net/forum?id=PLIt3a4yTm

---

Title: Towards Unbiased Calibration using Meta-Regularization

Authors: Cheng Wang, Jacek Golebiowski

Abstract: Model miscalibration has been frequently identified in modern deep neural networks. Recent work aims to improve model calibration directly through a differentiable calibration proxy. However, the calibration produced is often biased due to the binning mechanism. In this work, we propose to learn better-calibrated models via meta-regularization, which has two components: (1) gamma network (gamma-net), a meta learner that outputs sample-wise gamma value (continuous variable) for Focal loss for regularizing the backbone network; (2) smooth expected calibration error (SECE), a Gaussian-kernel based, unbiased, and differentiable surrogate to ECE that enables the smooth optimization of gamma-net. We evaluate the effectiveness of the proposed approach in regularizing neural networks towards improved and unbiased calibration on three computer vision datasets. We empirically demonstrate that: (a) learning sample-wise $\gamma$ as continuous variables can effectively improve calibration; (b) SECE smoothly optimizes gamma-net towards unbiased and robust calibration with respect to the binning schemes; and (c) the combination of gamma-net and SECE achieves the best calibration performance across various calibration metrics while retaining very competitive predictive performance as compared to multiple recently proposed methods.

URL: https://openreview.net/forum?id=Yf8iHCfG4W

---

Title: Layerwise complexity-matched learning yields an improved model of cortical area V2

Authors: Nikhil Parthasarathy, Olivier J Henaff, Eero P Simoncelli

Abstract: Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained endto-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compared to traditional hand-engineered models, or models optimized for coding efficiency or prediction. Moreover, the gradient backpropagation used in end-to-end learning is generally considered to be biologically implausible. Here, we overcome both of these limitations by developing a bottom-up self-supervised training methodology that operates independently on successive layers. Specifically, we maximize feature similarity between pairs of locally-deformed natural image patches, while decorrelating features across patches sampled from other images. Crucially, the deformation amplitudes are adjusted proportionally to receptive field sizes in each layer, thus matching the task complexity to the capacity at each stage of processing. In comparison with architecturematched versions of previous models, we demonstrate that our layerwise complexity-matched learning (LCL) formulation produces a two-stage model (LCL-V2) that is better aligned with selectivity properties and neural activity in primate area V2. We demonstrate that the complexity-matched learning paradigm is responsible for much of the emergence of the improved biological alignment. Finally, when the two-stage model is used as a fixed front end for a deep network trained to perform object recognition, the resultant model (LCL-V2Net) is significantly better than standard end-to-end self-supervised, supervised, and adversarially-trained models in terms of generalization to out-of-distribution tasks and alignment with human behavior. Our code and pre-trained checkpoints are available at https://github.com/nikparth/LCL-V2.git

URL: https://openreview.net/forum?id=lQBsLfAWhj

---

Title: SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Authors: Rui-Jie Zhu, Qihang Zhao, Guoqi Li, Jason Eshraghian

Abstract: As the size of large language models continue to scale, so does the computational resources required to run them. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and until now, SNNs have yet to succeed at language generation on large-scale datasets. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT', a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 46M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model when released, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self-attention to reduce quadratic computational complexity $\mathcal{O}(T^2)$ to linear complexity $\mathcal{O}(T)$ with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 32.2$\times$ fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

URL: https://openreview.net/forum?id=gcf1anBL9e

---

Title: Cooperative Online Learning with Feedback Graphs

Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Riccardo Della Vecchia

Abstract: We study the interplay between communication and feedback in a cooperative online learning setting, where a network of communicating agents learn a common sequential decision-making task through a feedback graph. We bound the network regret in terms of the independence number of the strong product between the communication network and the feedback graph. Our analysis recovers as special cases many previously known bounds for cooperative online learning with expert or bandit feedback. We also prove an instance-based lower bound, demonstrating that our positive results are not improvable except in pathological cases. Experiments on synthetic data confirm our theoretical findings.

URL: https://openreview.net/forum?id=PtNyIboDIG

---

Title: On the numerical reliability of nonsmooth autodiff: a MaxPool case study

Authors: Ryan Boustany

Abstract: This paper considers the reliability of automatic differentiation for neural networks involving the nonsmooth MaxPool operation across various precision levels (16, 32, 64 bits), architectures (LeNet, VGG, ResNet), and datasets (MNIST, CIFAR10, SVHN, ImageNet). Although AD can be incorrect, recent research has shown that it coincides with the derivative almost everywhere, even in the presence of nonsmooth operations. On the other hand, in practice, AD operates with floating-point numbers, and there is, therefore, a need to explore subsets on which AD can be {\em numerically} incorrect. Recently, \cite{bertoin2021numerical} empirically studied how the choice of $\ReLU'(0)$ changes the output of AD and define a numerical bifurcation zone where using $\ReLU('0) = 0$ differs from using $\ReLU'(0) = 1$. To extend this for a broader class of nonsmooth operations, we propose a new numerical bifurcation zone (where AD is incorrect over real numbers) and define a compensation zone (where AD is incorrect over floating-point numbers but correct over reals). Using SGD for training, we found that nonsmooth MaxPool Jacobians with lower norms maintain stable and efficient test accuracy, while higher norms can result in instability and decreased performance. We can use batch normalization, Adam-like optimizers, or increase precision to reduce MaxPool Jacobians influence.

URL: https://openreview.net/forum?id=142xsInVfp

---

Title: Universal Neurons in GPT2 Language Models

Authors: Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas

Abstract: A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models?
In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neurons are likely to be interpretable. In particular, we compute pairwise correlations of neuron activations over 100 million tokens for every neuron pair across five different seeds and find that 1-5\% of neurons are universal, that is, pairs of neurons which consistently activate on the same inputs. We then study these universal neurons in detail, finding that they usually have clear interpretations and taxonomize them into a small number of neuron families. We conclude by studying patterns in neuron weights to establish several universal functional roles of neurons in simple circuits: deactivating attention heads, changing the entropy of the next token distribution, and predicting the next token to (not) be within a particular set.

URL: https://openreview.net/forum?id=ZeI104QZ8I

---

Title: Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory: Application in Regression

Authors: Samuel Stocksieker, Denys Pommeret, Arthur Charpentier

Abstract: In supervised learning, it is quite frequent to be confronted with real imbalanced datasets. This situation leads to a learning difficulty for standard algorithms. Research and solutions in imbalanced learning have mainly focused on classification tasks. Despite its importance, very few solutions exist for imbalanced regression. In this paper, we propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates and especially dedicated to the problem of imbalanced data. This general approach encompasses two large families of synthetic oversampling: those based on perturbations, such as Gaussian Noise, and those based on interpolations, such as SMOTE. It also provides an explicit form of such machine learning algorithms. New synthetic data generators are deduced. We apply GOLIATH in imbalanced regression combining such generator procedures with a new wild-bootstrap resampling technique for the target values. We evaluate the performance of the GOLIATH algorithm in imbalanced regression where we compare our approach with state-of-the-art techniques.

URL: https://openreview.net/forum?id=DLqPhQxgYu

---

Title: SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Authors: Qi Yan, Zhengyang Liang, Yang Song, Renjie Liao, Lele Wang

Abstract: Permutation-invariant diffusion models of graphs achieve the invariant sampling and invariant loss functions by restricting architecture designs, which often sacrifice empirical performances. In this work, we first show that the performance degradation may also be contributed by the increasing modes of target distributions brought by invariant architectures since 1) the optimal one-step denoising scores are score functions of Gaussian mixtures models (GMMs) whose components center on these modes and 2) learning the scores of GMMs with more components is often harder. Motivated by the analysis, we propose SwinGNN along with a simple yet provable trick that enables permutation-invariant sampling. It benefits from more flexible (non-invariant) architecture designs and permutation-invariant sampling. We further design an efficient 2-WL message passing network using the shifted-window self-attention. Extensive experiments on synthetic and real-world protein and molecule datasets show that SwinGNN outperforms existing methods by a substantial margin on most metrics. Our code is released at https://github.com/qiyan98/SwinGNN.

URL: https://openreview.net/forum?id=abfi5plvQ4

---

Title: Bit-by-Bit: Investigating the Vulnerabilities of Binary Neural Networks to Adversarial Bit Flipping

Authors: Shamik Kundu, Sanjay Das, Sayar Karmakar, Arnab Raha, Souvik Kundu, Yiorgos Makris, Kanad Basu

Abstract: Binary Neural Networks (BNNs), operating with ultra-low precision weights, incur a significant reduction in storage and compute cost compared to the traditional Deep Neural Networks (DNNs). However, vulnerability of such models against various hardware attacks are yet to be fully unveiled. Towards understanding the potential threat imposed on such highly efficient models, in this paper, we explore a novel adversarial attack paradigm pertaining to BNNs. In specific, we assume the attack to be executed during deployment phase, prior to inference, to achieve malicious intentions, via manipulation of accessible network parameters. We aim to accomplish a graceless degradation in BNN accuracy to a point, where the fully functional network can behave as a random output generator at best, thus subverting the confidence in the system. To this end, we propose an Outlier Gradient-based Evolutionary (OGE) attack, that learns injection of minimal amount of critical bit flips in the pre-trained binary network weights, to introduce classification errors in the inference execution. To the best of our knowledge, this is the first work that leverages the outlier gradient weights to orchestrate a hardware-based bit-flip attack, that is highly effective against the typically resilient low-quantization BNNs. Exhaustive evaluations on popular image recognition datasets including Fashion-MNIST, CIFAR10, GTSRB, and ImageNet demonstrate that, OGE can drop up to 68.1% of the test images mis-classification, by flipping as little as 150 binary weights, out of 10.3 millions in a BNN architecture.

URL: https://openreview.net/forum?id=nB8foAclpo

---

Title: Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks

Authors: Akshay Kumar, Jarvis Haupt

Abstract: This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network
and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of
small magnitude in the neighborhood of certain saddle points.

URL: https://openreview.net/forum?id=hfrPag75Y0

---

Title: Towards Minimal Targeted Updates of Language Models with Targeted Negative Training

Authors: Lily H Zhang, Rajesh Ranganath, Arya Tafvizi

Abstract: Generative models of language exhibit impressive capabilities but still place non-negligible probability mass over undesirable outputs. In this work, we address the task of updating a model to avoid unwanted outputs while minimally changing model behavior otherwise, a challenge we refer to as a minimal targeted update. We first formalize the notion of a minimal targeted update and propose a method to achieve such updates using negative examples from a model's generations. Our proposed Targeted Negative Training (TNT) results in updates that keep the new distribution close to the original, unlike existing losses for negative signal which push down probability but do not control what the updated distribution will be. In experiments, we demonstrate that TNT yields a better trade-off between reducing unwanted behavior and maintaining model generation behavior than baselines, paving the way towards a modeling paradigm based on iterative training updates that constrain models from generating undesirable outputs while preserving their impressive capabilities.

URL: https://openreview.net/forum?id=lrZ2yiqOS2

---

Title: Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration

Authors: Kotaro Yoshida, Hiroki Naganuma

Abstract: Machine learning models traditionally assume that training and test data are independently and identically distributed. However, in real-world applications, the test distribution often differs from training. This problem, known as out-of-distribution (OOD) generalization, challenges conventional models. Invariant Risk Minimization (IRM) emerges as a solution that aims to identify invariant features across different environments to enhance OOD robustness. However, IRM's complexity, particularly its bi-level optimization, has led to the development of various approximate methods. Our study investigates these approximate IRM techniques, using the consistency and variance of calibration across environments as metrics to measure the invariance aimed for by IRM. Calibration, which measures the reliability of model prediction, serves as an indicator of whether models effectively capture environment-invariant features by showing how uniformly over-confident the model remains across varied environments. Through a comparative analysis of datasets with distributional shifts, we observe that Information Bottleneck-based IRM achieves consistent calibration across different environments. This observation suggests that information compression techniques, such as IB, are potentially effective in achieving model invariance. Furthermore, our empirical evidence indicates that models exhibiting consistent calibration across environments are also well-calibrated. This demonstrates that invariance and cross-environment calibration are empirically equivalent. Additionally, we underscore the necessity for a systematic approach to evaluating OOD generalization. This approach should move beyond traditional metrics, such as accuracy and F1 scores, which fail to account for the model’s degree of over-confidence, and instead focus on the nuanced interplay between accuracy, calibration, and model invariance.

URL: https://openreview.net/forum?id=9YqacugDER

---

Title: Reproducibility study of "Robust Fair Clustering: A Novel Fairness Attack and Defense Framework"

Authors: Lucas Ponticelli, Vincent Loos, Eren Kocadag, Kacper Bartosik

Abstract: This reproducibility study examines "Robust Fair Clustering: A Novel Fairness Attack and Defense Framework" by Chhabra et al. (2023), an innovative work in fair clustering algorithms. Our study focuses on validating the original paper's claims concerning the susceptibility of state-of-the-art fair clustering models to adversarial attacks and the efficacy of the proposed Consensus Fair Clustering (CFC) defence mechanism. We employ a similar experimental framework but extend our investigations by using additional datasets. Our findings confirm the original paper's claims, reinforcing the vulnerability of fair clustering models to adversarial attacks and the robustness of the CFC mechanism.

URL: https://openreview.net/forum?id=Xu1sEPhjqH

---

Title: TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Modality

Authors: Yinsong Wang, Shahin Shahrampour

Abstract: This paper addresses a cross-modal learning framework, where the objective is to enhance the performance of supervised learning in the primary modality using an unlabeled, unpaired secondary modality. Taking a probabilistic approach for missing information estimation, we show that the extra information contained in the secondary modality can be estimated via Nadaraya-Watson (NW) kernel regression, which can further be expressed as a kernelized cross-attention module (under linear transformation). This expression lays the foundation for introducing The Attention Patch (TAP), a simple neural network add-on that can be trained to allow data-level knowledge transfer from the unlabeled modality. We provide extensive numerical simulations using real-world datasets to show that TAP can provide statistically significant improvement in generalization across different domains and different neural network architectures, making use of seemingly unusable unlabeled cross-modal data.

URL: https://openreview.net/forum?id=73uyerai53

---

Title: Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Authors: Linjie Xu, zhengyao jiang, Jinyu Wang, Lei Song, Jiang Bian

Abstract: Offline reinforcement learning (RL) methodologies enforce constraints on the policy to adhere closely to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions during test time. Conventional approaches apply identical constraints for both value learning and test time inference. However, our findings indicate that the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a Mildly Constrained Evaluation Policy (MCEP) for test time inference with a more constrained target policy for value estimation. Since the target policy has been adopted in various prior approaches, MCEP can be seamlessly integrated with them as a plug-in. We instantiate MCEP based on TD3BC (Fujimoto & Gu, 2021), AWAC (Nair et al., 2020) and DQL (Wang et al., 2023) algorithms. The empirical results on D4RL MuJoCo locomotion, high-dimensional humanoid and a set of 16 robotic manipulation tasks show that the MCEP brought significant performance improvement on classic offline RL methods and can further improve SOTA methods. The codes are open-sourced at \url{https://github.com/egg-west/MCEP}.

URL: https://openreview.net/forum?id=imAROs79Pb

---

Title: Deep End-to-end Causal Inference

Authors: Tomas Geffner, Javier Antoran, Adam Foster, Wenbo Gong, Chao Ma, Emre Kiciman, Amit Sharma, Angus Lamb, Martin Kukla, Nick Pawlowski, Agrin Hilmkil, Joel Jennings, Meyer Scetbon, Miltiadis Allamanis, Cheng Zhang

Abstract: Causal inference is essential for data-driven decision-making across domains such as business engagement, medical treatment, and policy making. However, in practice, causal inference suffers from many limitations including unknown causal graphs, missing data problems, and mixed data types. To tackle those challenges, we develop Deep End-to-end Causal Inference (DECI) framework, a flow based non-linear additive noise model combined with variational inference, which can perform both Bayesian causal discovery and inference. Theoretically, we show that DECI unifies many existing structural equation model (SEM) based causal inference techniques and can recover the ground truth mechanism under standard assumptions. Motivated by the challenges in the real world, we further extend DECI to heterogeneous, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. Empirically, we conduct extensive experiments (over a thousand) to show the competitive performance of DECI when compared to relevant baselines for both causal discovery and inference with both synthetic and causal machine learning benchmarks across data types and levels of missingness.

URL: https://openreview.net/forum?id=e6sqttxEGX

---

Title: A Survey on Fairness Without Demographics

Authors: Patrik Joslin Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji

Abstract: The issue of bias in Machine Learning (ML) models is a significant challenge for the machine learning community. Real-world biases can be embedded in the data used to train models, and prior studies have shown that ML models can learn and even amplify these biases. This can result in unfair treatment of individuals based on their inherent characteristics or sensitive attributes such as gender, race, or age. Ensuring fairness is crucial with the increasing use of ML models in high-stakes scenarios and has gained significant attention from researchers in recent years. However, the challenge of ensuring fairness becomes much greater when the assumption of full access to sensitive attributes does not hold. The settings where the hypothesis does not hold include cases where (1) only limited or noisy demographic information is available or (2) demographic information is entirely unobserved due to privacy restrictions. This survey reviews recent research efforts to enforce fairness when sensitive attributes are missing. We propose a taxonomy of existing works and, more importantly, highlight current challenges and future research directions to stimulate research in ML fairness in the setting of missing sensitive attributes.

URL: https://openreview.net/forum?id=3HE4vPNIfX

---

Title: Reproducibility study of FairAC

Authors: Gijs de Jong, Macha J. Meijer, Derck W. E. Prinzhorn, Harold Ruiter

Abstract: This work aims to reproduce the findings of the paper "Fair Attribute Completion on Graph with Missing Attributes" written by Guo et al. (2023) by investigating the claims made in the paper. This paper suggests that the results of the original paper are reproducible and thus, the claims hold. However, the claim that FairAC is a generic framework for many downstream tasks is very broad and could therefore only be partially tested. Moreover, we show that FairAC is generalizable to various datasets and sensitive attributes and show evidence that the improvement in group fairness of the FairAC framework does not come at the expense of individual fairness. Lastly, the codebase of FairAC has been refactored and is now easily applicable for various datasets and models.

URL: https://openreview.net/forum?id=ccDi5jtSF7

---

New submissions
===============

Title: Threshold Moving for Online Class Imbalance Learning with Dynamic Evolutionary Cost Vector

Abstract: Existing online class imbalance learning methods fail to achieve optimal performance because their assumptions about enhancing minority classes are hardcoded in model parameters. To learn the model for the performance measure directly instead of using heuristics, we introduce a novel framework based on a dynamic evolutionary algorithm called Online Evolutionary Cost Vector (OECV). By bringing the threshold moving method from the cost-sensitive learning paradigm and viewing the cost vector as a hyperparameter, our method transforms the online class imbalance issue into a bi-level optimization problem. The first layer utilizes a base online classifier for rough prediction, and the second layer refines the prediction using a threshold moving cost vector learned via a dynamic evolutionary algorithm (EA). OECV benefits from both the efficiency of online learning methods and the high performance of EA, as demonstrated in empirical studies against four state-of-the-art methods on 30 datasets. Additionally, we show the effectiveness of the EA component in the ablation study by comparing OECV to its two variants, OECV-n and OECV-ea, respectively. This work reveals the superiority of incorporating EA into online imbalance classification tasks, while its potential extends beyond the scope of the class imbalance setting and warrants future research attention. We release our code1 for future research.

URL: https://openreview.net/forum?id=EIPnUofed9

---

Title: Ask Your Distribution Shift if Pre-Training is Right for You

Abstract: Pre-training is a widely used approach to develop models that are robust to distribution shifts. However, in practice, its effectiveness varies: fine-tuning a pre-trained model improves robustness significantly in some cases but *not at all* in others (compared to training from scratch). In this work, we seek to characterize the failure modes that pre-training *can* and *cannot* address. In particular, we focus on two possible failure modes of models under distribution shift: poor extrapolation (e.g., they cannot generalize to a different domain) and biases in the training data (e.g., they rely on spurious features). Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases. After providing theoretical motivation and empirical evidence for this finding, we explore two of its implications for developing robust models: (1) pre-training and interventions designed to prevent exploiting biases have complementary robustness benefits, and (2) fine-tuning on a (very) small, non-diverse but *de-biased* dataset can result in significantly more robust models than fine-tuning on a large and diverse but biased dataset.

URL: https://openreview.net/forum?id=edULLIVnoc

---

Title: Learning Hierarchical Relational Representations through Relational Convolutions

Abstract: A maturing area of research in deep learning is the study of architectures and inductive biases for learning representations of relational features. In this paper, we focus on the problem of learning representations of hierarchical relations, proposing an architectural framework we call "relational convolutional networks". The key to the framework is a novel operation that captures the relational patterns in groups of objects by convolving graphlet filters—learnable templates of relational patterns—against subsets of the input. Composing relational convolutions gives rise to a deep architecture that learns representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.

URL: https://openreview.net/forum?id=vNZlnznmV2

---

Title: Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

Abstract: Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent research has proposed proxy-based active learning, which pre-computes features to reduce computational costs. Yet, this approach often incurs a significant loss in active learning performance, sometimes outweighing the computational cost savings. This paper demonstrates that not all sample selection differences result in performance degradation. Furthermore, we show that suitable training methods can mitigate the decline of active learning performance caused by certain selection discrepancies. Building upon detailed analysis, we propose a novel method, aligned selection via proxy, which improves proxy-based active learning performance by updating pre-computed features and selecting a proper training method. Extensive experiments validate that our method improves the total cost of efficient active learning while maintaining computational efficiency.

URL: https://openreview.net/forum?id=PNcgJMJcdl

---

Title: PASS: Pruning Attention Heads with Almost-sure Sparsity Targets

Abstract: Transformer models have been widely used to obtain high accuracy values in multiple fields including natural language processing (NLP), computer vision, and more. This superior performance typically comes at the expense of substantial computational overhead. Multi-head attention is the key factor in the success of Transformer models that has been found to be computationally expensive. Significant research effort has been devoted to improving attention compute efficiency by pruning redundant attention heads. A widely adopted paradigm is to jointly learn a set of gate variables and apply thresholds on gate values to prune heads. Previous work shows a high level of sensitivity to threshold tuning which can limit subnetwork performance and prevent them from wider adoption in practice. We propose the notion of almost-sure sparsity to overcome this limitation and develop a generic framework for Pruning with Almost-Sure Sparsity (PASS) targets over attention heads. To further boost efficiency, we design a novel technique, concentrator, based on which we develop PASSCONC (PASS with CONCentrator). We also present a simple-yet-effective strategy to further improve subnetwork performance by clipping and selectively reopening learned gates. We investigate PASS and PASSCONC on two widely studied architectures: encoder-decoder (ED) Transformer and encoder-only Transformer (e.g., BERT). Experiments on IWSLT14 German-to-English translation and GLUE benchmark tasks demonstrate that our approaches outperform the SOTA by achieving up to 1.33 higher BLEU scores, 1.44% higher accuracy, and 60% higher attention speedups.

URL: https://openreview.net/forum?id=S4duStTKGL

---

Title: From Persona to Personalization: A Survey on Role-Playing Language Agents

Abstract: Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playing performance. RPLAs can mimic a wide range of personas, ranging from historical figures and fictional characters to real-life individuals. Consequently, they have catalyzed numerous AI applications, such as emotional companions, interactive video games, personalized assistants and copilots, and digital clones. In this paper, we conduct a comprehensive survey of this field, illustrating the evolution and recent progress in RPLAs integrating with cutting-edge LLM technologies. We categorize personas into three types: 1) Demographic Persona, which leverages statistical stereotypes; 2) Character Persona, focused on well-established figures; and 3) Individualized Persona, customized through ongoing user interactions for personalized services. We begin by presenting a comprehensive overview of current methodologies for RPLAs, followed by the details for each persona type, covering corresponding data sourcing, agent construction, and evaluation. Afterward, we discuss the fundamental risks, existing limitations, and prospects of RPLAs. Additionally, we provide a brief review of RPLAs in AI products in the market, which reflects practical user demands that shape and drive RPLA research. Through this survey, we aim to establish a clear taxonomy of RPLA research and applications, facilitate future research in this critical and ever-evolving field, and pave the way for a future where humans and RPLAs coexist in harmony.

URL: https://openreview.net/forum?id=xrO70E8UIZ

---

Title: How Far Are We From AGI?

Abstract: The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiveness comparable to human intelligence, reflects a paramount milestone in AI evolution. While existing works have summarized specific recent advancements of AI, they lack a comprehensive discussion of AGI's definitions, goals, and developmental trajectories. Different from existing survey papers, this paper delves into the pivotal questions of our proximity to AGI and the strategies necessary for its realization through extensive surveys, discussions, and original perspectives. We start by articulating the requisite capability frameworks for AGI, integrating the internal, interface, and system dimensions. As the realization of AGI requires more advanced capabilities and adherence to stringent constraints, we further discuss necessary AGI alignment technologies to harmonize these factors. Notably, we emphasize the importance of approaching AGI responsibly by first defining the key levels of AGI progression, followed by the evaluation framework that situates the status-quo, and finally giving our roadmap of how to reach the pinnacle of AGI. Moreover, to give tangible insights into the ubiquitous impact of the integration of AI, we outline existing challenges and potential pathways toward AGI in multiple domains. In sum, serving as a pioneering exploration into the current state and future trajectory of AGI, this paper aims to foster a collective comprehension and catalyze broader public discussions among researchers and practitioners on AGI.

URL: https://openreview.net/forum?id=H2ZKqfNd0U

---

Title: Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift

Abstract: We consider learning discriminative representations of variables related to each other via a causal graph. To learn representations that are robust against interventional distribution shifts, the training dataset is augmented with interventional data in addition to existing observational data. However, even when the underlying causal model is known, existing approaches treat interventional data like observational data, ignoring the independence relations resulting from these interventions. This leads to representations that exhibit large disparities in predictive performance on observational and interventional data. The performance disparity worsens when the quantity of interventional data available for training is limited. In this paper, (1) we first identify a strong correlation between this performance disparity and adherence of the representations to the statistical independence conditions induced by the underlying causal model during interventions. (2) For linear models, we derive sufficient conditions on the proportion of interventional data during training, for which enforcing statistical independence between representations corresponding to the intervened node and its non-descendants during interventions can lower the test-time error on interventional data. Following these insights, we propose RepLIn, an algorithm to explicitly enforce this statistical independence during interventions. We demonstrate the utility of RepLIn on synthetic and real face image datasets. Our experiments show that RepLIn is scalable with the number of nodes in the causal graph and is suitable to improve the robustness of representations against interventional distribution shifts of both continuous and discrete latent variables compared to the ERM baselines.

URL: https://openreview.net/forum?id=pZRanZlab4

---

Title: Deciphering Attention Mechanisms: Optimization and Fenchel Dual Solutions

Abstract: Attention has been widely adopted in many state-of-the-art deep learning models. While the significant performance improvements it brings have attracted great interest, the theoretical understanding of attention remains limited. This paper presents a new perspective on understanding attention by showing that it can be seen as a solver of a family of estimation problems. Specifically, we explore a convex optimization problem central to many estimation tasks prevalent in the development of deep learning architectures. Instead of solving this problem directly, we address its Fenchel dual and derive a closed-form approximation of the optimal solution. This approach results in a generalized attention framework, with the popular dot-product attention used in transformer networks being a special case. We show that T5 transformer has implicitly adopted the general form of the solution by demonstrating that this expression unifies the word mask and the positional encoding functions. Finally, we discuss how these new attention structures can be practically applied in model design and argue that the underlying convex optimization problem offers a principled justification for the architectural choices in attention mechanisms.

URL: https://openreview.net/forum?id=lHnbmXtJXf

---

Title: Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks

Abstract: Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.

URL: https://openreview.net/forum?id=Uc2mqNPkEq

---

Title: Improved Variational Bayesian Phylogenetic Inference using Mixtures

Abstract: We introduce VBPI-Mixtures, an innovative algorithm aimed at improving the precision of phylogenetic posterior distributions, with a focus on accurately approximating tree-topologies and branch lengths. Although Variational Bayesian Phylogenetic Inference (VBPI) a state-of-the-art black-box variational inference (BBVI) framework has achieved significant success in approximating these distributions, it faces challenges in dealing with the multimodal nature of tree-topology posteriors. While advanced deep learning techniques like normalizing flows and graph neural networks have enhanced VBPI's approximations of branch-length posteriors, there has been a gap in improving its tree-topology posterior approximations. Our novel VBPI-Mixtures algorithm addresses this gap by leveraging recent advancements in mixture learning within the BBVI domain. Consequently, VBPI-Mixtures can capture distributions over tree-topologies that other VBPI algorithms cannot model. We demonstrate superior performance on challenging density estimation tasks across various real phylogenetic datasets.

URL: https://openreview.net/forum?id=yuhG3VHK5f

---

Title: An Attentive Approach for Building Partial Reasoning Agents from Pixels

Abstract: We study the problem of building reasoning agents that are able to generalize in an effective manner. Towards this goal, we propose an end-to-end approach for building model-based reinforcement learning agents that dynamically focus their reasoning to the relevant aspects of the environment: after automatically identifying the distinct aspects of the environment, these agents dynamically filter out the relevant ones and then pass them to their simulator to perform partial reasoning. Unlike existing approaches, our approach works with pixel-based inputs and it allows for interpreting the focal points of the agent. Our quantitative analyses show that the proposed approach allows for effective generalization in high-dimensional domains with raw observational inputs. We also perform ablation analyses to validate of design choices. Finally, we demonstrate through qualitative analyses that our approach actually allows for building agents that focus their reasoning on the relevant aspects of the environment.

URL: https://openreview.net/forum?id=S3FUKFMRw8

---

Title: In-Context Feature Adaptation for Bongard Problems

Abstract: Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract “concept” from a set of positive and negative “support” images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, most existing methods have reached at best 69% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets’ lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not adapt image features given information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because the “key concept” in a typical Bongard problem can often only be distinguished using multiple positives and multiple negatives. We explore simple methods to incorporate this context and show substantial gains over prior works, leading to new state-of-the-art accuracy on Bongard-LOGO (75.3%) and Bongard-HOI (76.4%) compared to methods with equivalent vision backbone architectures and strong performance on the original Bongard problem set (60.8%).

URL: https://openreview.net/forum?id=kUuPUIPvJ6

---

Title: Biased Dueling Bandits with Stochastic Delayed Feedback

Abstract: The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information retrieval, and more. However, in many real-world applications, the feedback for actions is often subject to unavoidable delays and is not immediately available to the agent. This partially observable issue poses a significant challenge to existing dueling bandit literature, as it significantly affects how quickly and accurately the agent can update their policy on the fly. In this paper, we introduce and examine the dueling bandit problem with stochastic delayed feedback, revealing that this new practical problem will delve into a more realistic and intriguing scenario involving a preference bias between the selections. We present two algorithms designed to handle situations involving delay. Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay. The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available. We provide a comprehensive regret analysis for the two proposed algorithms and then evaluate their empirical performance on both synthetic and real datasets.

URL: https://openreview.net/forum?id=HwAZDVxkLX

---

Title: IM-Context: In-Context Learning for Imbalanced Regression Tasks

Abstract: Regression models often fail to generalize effectively in regions characterized by highly imbalanced label distributions. Previous methods for deep imbalanced regression rely on gradient-based weight updates, which tend to overfit in underrepresented regions. This paper proposes a paradigm shift towards in-context learning as an effective alternative to conventional in-weight learning methods, particularly for addressing imbalanced regression. In-context learning refers to the ability of a model to condition itself, given a prompt sequence composed of in-context samples (input-label pairs) alongside a new query input to generate predictions, without requiring any parameter updates. In this paper, we study the impact of the prompt sequence on the model performance from both theoretical and empirical perspectives. We emphasize the importance of localized context in reducing bias within regions of high imbalance. Empirical evaluations across a variety of real-world datasets demonstrate that in-context learning substantially outperforms existing in-weight learning methods in scenarios with high levels of imbalance.

URL: https://openreview.net/forum?id=p4Y844vJWG

---

Title: Efficient Deep Learning with Decorrelated Backpropagation

Abstract: The backpropagation algorithm remains the dominant and most successful method for
training deep neural networks (DNNs). At the same time, training DNNs at scale comes at
a significant computational cost and therefore a high carbon footprint. Converging evidence
suggests that input decorrelation may speed up deep learning. However, to date, this has not
yet translated into substantial improvements in training efficiency in large-scale DNNs. This
is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation.
Here, we show for the first time that much more efficient training of very deep neural networks
using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel
algorithm which induces network-wide input decorrelation using minimal computational
overhead. By combining this algorithm with careful optimizations, we achieve a more than
two-fold speed-up and higher test accuracy compared to backpropagation when training
a 18-layer deep residual network. This demonstrates that decorrelation provides exciting
prospects for efficient deep learning at scale.

URL: https://openreview.net/forum?id=HUJ4rZLcju

---

Title: RLHF Workflow: From Reward Modeling to Online RLHF

Abstract: We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. However, existing open-source RLHF projects are still largely confined to the offline learning setting. In this technical report, we aim to fill in this gap and provide a detailed recipe that is easy to reproduce for online iterative RLHF. In particular, since online human feedback is usually infeasible for open-source communities with limited resources, we start by constructing preference models using a diverse set of open-source datasets and use the constructed proxy preference model to approximate human feedback. Then, we discuss the theoretical insights and algorithmic principles behind online iterative RLHF, followed by a detailed practical implementation. Our trained LLM achieves impressive performance on LLM chatbot benchmarks, including AlpacaEval-2, Arena-Hard, and MT-Bench, as well as other academic benchmarks such as HumanEval and TruthfulQA. We have shown that supervised fine-tuning (SFT) and iterative RLHF can obtain state-of-the-art performance with fully open-source datasets. Further, we have made our models, curated datasets, and comprehensive step-by-step code guidebooks publicly available.

URL: https://openreview.net/forum?id=a13aYUU9eU

---

Title: Evaluating MEDIRL: A Replication and Ablation Study of Maximum Entropy Deep Inverse Reinforcement Learning for Human Social Navigation

Abstract: In this study, we enhance the Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) framework, targeting its application in human-robot interaction (HRI) for modeling pedestrian behavior in crowded environments. Our work is grounded in the pioneering research by Fahad, Chen, and Guo, and aims to elevate MEDIRL’s efficacy in real-world HRI settings. We replicated the original MEDIRL model and conducted detailed ablation studies, focusing on key model components like learning rates, state dimensions, and network layers. Our findings reveal the effectiveness of a two-dimensional state representation over a three-dimensional approach, significantly improving model accuracy for pedestrian behavior prediction in HRI scenarios. These results not only demonstrate MEDIRL’s enhanced performance but also offer valuable insights for future HRI system development, emphasizing the importance of model customization to specific environmental contexts. Our research contributes to advancing the field of socially intelligent navigation systems, promoting more
intuitive and safer human-robot interactions.

URL: https://openreview.net/forum?id=ueNSmjdhyY

---

Title: Amortizing Bayesian Posterior Inference in Tractable Likelihood Models

Abstract: Bayesian inference provides a natural way of incorporating prior beliefs and assigning a probability measure to the space of hypotheses. However, it is often infeasible in practice as it requires expensive iterative routines like MCMC to approximate the posterior distribution. Not only are these methods computationally expensive, but they must also be re-run whenever new observations are available, making them impractical or of limited use. To alleviate such difficulties, we amortize the posterior parameter inference for probabilistic models through permutation invariant architectures. While this paradigm is briefly explored in Simulation Based Inference (SBI), Neural Processes (NPs) and Gaussian Process (GP) kernel estimation, a more general treatment of amortized Bayesian inference in known likelihood models has been largely unexplored. We additionally utilize a simple but strong approach to further amortize on the dimensionality of observations, allowing a single system to infer variable dimensional parameters. In particular, we rely on the reverse-KL based amortized Variational Inference (VI) approach to train inference systems and compare them with forward-KL based SBI approaches across different architectural setups. We conduct thorough experiments to demonstrate the effectiveness of our proposed approach, especially in real-world and model misspecification settings.

URL: https://openreview.net/forum?id=zZmDfmcFUi

---

Title: Constraining Generative Models for Engineering Design with Negative Data

Abstract: Generative models have recently achieved remarkable success and widespread adoption in society, yet they still often struggle to generate realistic and accurate outputs. This challenge extends beyond language and vision into fields like engineering design, where safety-critical engineering standards and non-negotiable physical laws tightly constrain what outputs are considered acceptable.
In this work, we introduce two approaches to guide models toward constraint-satisfying outputs using `negative data' -- examples of what to avoid. Our negative data generative models (NDGMs) outperform state-of-the-art NDGMs by 4x in constraint satisfaction and easily outperform classic generative models using 8x less data in certain problems. To demonstrate this, we rigorously benchmark our NDGMs against 14 baseline models across numerous synthetic and real engineering problems, such as ship hulls with hydrodynamic constraints and vehicle design with impact safety constraints. Our benchmarks showcase both the best-in-class performance of our new NDGM models and the widespread dominance of NDGMs over classic generative models in general. In doing so, we advocate for the more widespread use of NDGMs in engineering design tasks.

URL: https://openreview.net/forum?id=FNBv2vweBI

---

Title: Single-Shot Plug-and-Play Methods for Inverse Problems

Abstract: The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on pre-trained denoisers using large datasets. In this work, we introduce Single-Shot PnP methods (SS-PnP), shifting the focus to solving inverse problems with minimal data. First, we integrate Single-Shot proximal denoisers into iterative methods, enabling training with single instances. Second, we propose implicit neural priors based on a novel function that preserves relevant frequencies to capture fine details while avoiding the issue of vanishing gradients. We demonstrate, through extensive numerical and visual experiments, that our method leads to better approximations.

URL: https://openreview.net/forum?id=vXevE43NxF

---

Title: Learning-Based Link Anomaly Detection in Continuous-Time Dynamic Graphs

Abstract: Anomaly detection in continuous-time dynamic graphs is an emerging field yet under-explored in the context of learning algorithms. In this paper, we pioneer structured analyses of link-level anomalies and graph representation learning for identifying categorically anomalous links in these graphs. First, we introduce a fine-grained taxonomy for edge-level anomalies leveraging structural, temporal, and contextual graph properties. Based on these properties, we introduce a method for generating and injecting typed anomalies into graphs. Next, we introduce a novel method to generate continuous-time dynamic graphs featuring consistencies across either or combinations of time, structure, and context. To improve the capabilities of temporal graph methods in learning to detect specific types of anomalous links rather than the bare existence of a link, we extend the generic link prediction setting by: (1) conditioning link existence on contextual edge attributes; and (2) refining the training regime to accommodate diverse perturbations in the negative edge sampler. Building on this, we benchmark methods for typed anomaly detection. Comprehensive experiments on synthetic and real-world datasets -- featuring synthetic and labeled organic anomalies and employing six state-of-the-art learning methods -- validate our taxonomy and generation processes for anomalies and benign graphs, as well as our approach to adapting link prediction methods for anomaly detection. Our results further reveal that different learning methods excel in capturing different aspects of graph normality and detecting different types of anomalies. We conclude with a comprehensive list of findings highlighting opportunities for future research. The code is available at https://anonymous.4open.science/r/TGB-link-anomaly-detection-anonymous-CBF1.

URL: https://openreview.net/forum?id=8imVCizVEw

---

Title: SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer

Abstract: Existing data augmentation in self-supervised learning, while diverse, fails to preserve the inherent structure of natural images. This results in distorted augmented samples with compromised semantic information, ultimately impacting downstream performance. To overcome this limitation, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel data augmentation technique based on Neural Style Transfer. SASSL decouples semantic and stylistic attributes in images and applies transformations exclusively to their style while preserving content, generating diverse samples that better retain semantic information. Our augmentation technique boosts top-1 image classification accuracy on ImageNet by up to 2% compared to established self-supervised methods like MoCo, SimCLR, and BYOL, while achieving superior transfer learning performance across various datasets.

URL: https://openreview.net/forum?id=NxhXtkPYsk

---

Title: FMU: Fair Machine Unlearning via Distribution Correction

Abstract: Machine unlearning, a technique used to remove the influence of specific data points from a trained model, is often applied in high-stakes scenarios. While most current machine unlearning methods aim to maintain the performance of the model after removing requested data traces, they may inadvertently introduce biases during the unlearning process. This raises the question: Does machine unlearning actually introduce bias? To address this question, we evaluate the fairness of model predictions before and after applying existing machine unlearning approaches. Interestingly, our findings reveal that the model after unlearning can exhibit a greater bias. To mitigate the bias induced by unlearning, we developed a novel framework, Fair Machine Unlearning (FMU), which ensures group fairness during the unlearning process. Specifically, for privacy preservation, FMU first withdraws the model updates of the batches containing the unlearning requests. For debiasing, it then deletes the model updates of sampled batches that have reversed sensitive attributes associated with the unlearning requests. To validate the effectiveness of FMU, we compare it with standard machine unlearning baselines and one existing fair machine unlearning approach. FMU demonstrates superior fairness in predictions while maintaining privacy and comparable prediction accuracy to retraining the model. Furthermore, we illustrate the advantages of FMU in scenarios involving diverse unlearning requests, encompassing various data distributions of the original dataset. Our framework is orthogonal to specific machine unlearning approaches and debiasing techniques, making it flexible for various applications. This work represents a pioneering effort, serving as a foundation for more advanced techniques in fair machine unlearning.

URL: https://openreview.net/forum?id=E9BELz8Mod

---

Title: Adjacency Search Embeddings

Abstract: In this study, we propose two novel Adjacency Search Embeddings that are inspired by the theory of identifying s-t minimum cuts: Maximum Adjacency Search (MAS) and Threshold-based Adjacency Search (TAS), which leverage both the node and a subset of its neighborhood to discern a set of nodes well-integrated into higher-order network structures. This serves as context for generating higher-order representations. Our approaches, when used in conjunction with the skip-gram model, exhibit superior effectiveness in comparison to other shallow embedding techniques in tasks such as link prediction and node classification. By incorporating our mechanisms as a preprocessing technique, we show substantial improvements in node classification performance across GNNs like GCN, GraphSage, and Gatv2 on both attributed and non-attributed networks. Furthermore, we substantiate the applicability of our approaches, shedding light on their aptness for specific graph scenarios. Our source code can be accessed through "https://anonymous.4open.science/r/adjacency-embeddings-DC6B".

URL: https://openreview.net/forum?id=GDN5cFTNaL

---

Title: Normality-Guided Distributional Reinforcement Learning for Continuous Control

Abstract: Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value distribution, not just the mean. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirical quite close to normal. We design a method that exploits this property, employ variances predicted from a variance network, along with returns, to analytically compute target quantile bars representing a normal for our distributional value function. In addition, we propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function. The approach we outline is compatible with many DRL structures. We use two representative on-policy algorithms, PPO and TRPO, as testbeds. Our method yields statistically significant improvements in 10 out of 16 continuous task settings, while utilizing a reduced number of weights and achieving faster training time compared to an ensemble-based method for quantifying value distribution uncertainty.

URL: https://openreview.net/forum?id=z27hb0rmLT

---

Title: No Need for Ad-hoc Substitutes: The Expected Cost is a Principled All-purpose Classification Metric

Abstract: The expected cost (EC) is one of the main classification metrics introduced in statistical and machine learning books. It is based on the assumption that, for a given application of interest, each decision made by the system has a corresponding cost which depends on the true class of the sample. An evaluation metric can then be defined by taking the expectation of the cost over the data. Two special cases of the EC are widely used in the machine learning literature: the error rate (one minus the accuracy) and the balanced error rate (one minus the balanced accuracy or unweighted average recall). Other instances of the EC can be useful for applications in which some types of errors are more severe than others, or when the prior probabilities of the classes differ between the evaluation data and the use-case scenario. Surprisingly, the general form for the EC is rarely used in the machine learning literature. Instead, alternative ad-hoc metrics like the F-beta score and the Matthews correlation coefficient (MCC) are used for many applications. In this work, we argue that the EC is superior to these alternative metrics, being more general, interpretable, and adaptable to any application scenario. We provide both theoretically-motivated discussions as well as examples to illustrate the behavior of the different metrics.

URL: https://openreview.net/forum?id=5PPbvCExZs

---

Title: Attacking the Spike: On the Security of Spiking Neural Networks to Adversarial Examples

Abstract: Spiking neural networks (SNNs) have attracted much attention for their high energy efficiency and for recent advances in their classification performance. However, unlike traditional deep learning approaches, the analysis and study of the robustness of SNNs to adversarial examples remain relatively underdeveloped. In this work, we focus on advancing the adversarial attack side of SNNs and make three major contributions. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient estimation technique, even in the case of adversarially trained SNNs. Second, using the best single surrogate gradient estimation technique, we analyze the transferability of adversarial attacks on SNNs and other state-of-the-art architectures like Vision Transformers (ViTs), as well as CNNs. Our analyzes reveal two key areas where SNN adversarial attacks can be enhanced: no white-box attack effectively exploits the use of multiple surrogate gradient estimators for SNNs, and no single model attack is effective at generating adversarial examples misclassified by both SNNs and non-SNN models simultaneously.

For our third contribution, we develop a new attack, the Mixed Dynamic Spiking Estimation (MDSE) attack to address these issues. MDSE utilizes a dynamic gradient estimation scheme to fully exploit multiple surrogate gradient estimator functions. In addition, our novel attack generates adversarial examples capable of fooling both SNN and non-SNN models simultaneously. The MDSE attack is as much as $91.4\%$ more effective on SNN/ViT model ensembles and provides a $3\times$ boost in attack effectiveness on adversarially trained SNN ensembles, compared to conventional white-box attacks like Auto-PGD. Our experiments are broad and rigorous, covering three datasets (CIFAR-10, CIFAR-100 and ImageNet) and nineteen classifier models (seven for each CIFAR dataset and five models for ImageNet). We will release a full publicly available code repository for the models and attacks upon publication.

URL: https://openreview.net/forum?id=2TBTuRZxhm

---

Reply all

Reply to author

Forward

0 new messages