Weekly TMLR digest for Jan 07, 2024

2 views

Skip to first unread message

TMLR

unread,

Jan 6, 2024, 7:00:13 PMJan 6

to tmlr-annou...@googlegroups.com

New certifications
==================

Expert Certification: Neural Implicit Manifold Learning for Topology-Aware Density Estimation

Brendan Leigh Ross, Gabriel Loaiza-Ganem, Anthony L. Caterini, Jesse C. Cresswell

https://openreview.net/forum?id=lTOku838Zv

---

Featured Certification: Pathologies of Predictive Diversity in Deep Ensembles

Taiga Abe, E. Kelly Buchanan, Geoff Pleiss, John Patrick Cunningham

https://openreview.net/forum?id=TQfQUksaC8

---

Featured Certification: DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero-Soriano

https://openreview.net/forum?id=FDt2UGM1Nz

---

Accepted papers
===============

Title: Synaptic Interaction Penalty: Appropriate Penalty Term for Energy-Efficient Spiking Neural Networks

Authors: Kazuma Suetake, Takuya Ushimaru, Ryuji Saiin, Yoshihide Sawada

Abstract: Spiking neural networks (SNNs) are energy-efficient neural networks because of their spiking nature. However, as the spike firing rate of SNNs increases, the energy consumption does as well, and thus, the advantage of SNNs diminishes. Here, we tackle this problem by introducing a novel penalty term for the spiking activity into the objective function in the training phase. Our method is designed so as to optimize the energy consumption metric directly without modifying the network architecture. Therefore, the proposed method can reduce the energy consumption more than other methods while maintaining the accuracy. We conducted experiments for image classification tasks, and the results indicate the effectiveness of the proposed method, which mitigates the dilemma of the energy--accuracy trade-off.

URL: https://openreview.net/forum?id=42BKnT2qW3

---

Title: Neural Implicit Manifold Learning for Topology-Aware Density Estimation

Authors: Brendan Leigh Ross, Gabriel Loaiza-Ganem, Anthony L. Caterini, Jesse C. Cresswell

Abstract: Natural data observed in $\mathbb{R}^n$ is often constrained to an $m$-dimensional manifold $\mathcal{M}$, where $m < n$. This work focuses on the task of building theoretically principled generative models for such data. Current generative models learn $\mathcal{M}$ by mapping an $m$-dimensional latent variable through a neural network $f_\theta: \mathbb{R}^m \to \mathbb{R}^n$. These procedures, which we call pushforward models, incur a straightforward limitation: manifolds cannot in general be represented with a single parameterization, meaning that attempts to do so will incur either computational instability or the inability to learn probability densities within the manifold. To remedy this problem, we propose to model $\mathcal{M}$ as a neural implicit manifold: the set of zeros of a neural network. We then learn the probability density within $\mathcal{M}$ with a constrained energy-based model, which employs a constrained variant of Langevin dynamics to train and sample from the learned manifold. In experiments on synthetic and natural data, we show that our model can learn manifold-supported distributions with complex topologies more accurately than pushforward models.

URL: https://openreview.net/forum?id=lTOku838Zv

---

Title: Exploring Format Consistency for Instruction Tuning

Authors: Shihao Liang, Runchu Tian, Kunlun Zhu, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun

Abstract: Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger collections. However, different users have their unique ways of expressing instructions, and there often exist variations across different datasets in the instruction styles and formats, i.e., format inconsistency. In this work, a framework named Unified Instruction Tuning (UIT) is proposed, which calls OpenAI APIs for automatic format transfer among different instruction tuning datasets such as PromptSource, FLAN and CrossFit. With the framework, we (1) demonstrate the necessity of maintaining format consistency in instruction tuning; (2) improve the generalization performance on unseen instructions on T5-LM-xl; (3) provide a novel perplexity-based denoising method to reduce the noise of automatic format transfer to make the UIT framework more practical and a smaller offline model based on GPT-J that achieves comparable format transfer capability to OpenAI APIs to reduce costs in practice. Further analysis regarding variations of targeted formats and other effects is intended. The code and trained models will soon be available.

URL: https://openreview.net/forum?id=n8fZ6mY6PB

---

Title: Variational Classification: A Probabilistic Generalization of the Softmax Classifier

Authors: Shehzaad Zuzar Dhuliawala, Mrinmaya Sachan, Carl Allen

Abstract: We present a latent variable model for classification that provides a novel probabilistic interpretation of neural network softmax classifiers. We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders, that generalises the cross-entropy loss used to train classification models. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency between their anticipated distribution, required for accurate label predictions to be output, and the empirical distribution found in practice. We augment the variational objective to mitigate such inconsistency and encourage a chosen latent distribution, instead of the implicit assumption in off-the-shelf softmax classifiers. Overall, we provide new theoretical insight into the inner workings of widely-used softmax classification. Empirical evaluation on image and text classification datasets demonstrates that our proposed approach, variational classification, maintains classification accuracy while the reshaped latent space improves other desirable properties of a classifier, such as calibration, adversarial robustness, robustness to distribution shift and sample efficiency useful in low data settings.

URL: https://openreview.net/forum?id=EWv9XGOpB3

---

Title: Pathologies of Predictive Diversity in Deep Ensembles

Authors: Taiga Abe, E. Kelly Buchanan, Geoff Pleiss, John Patrick Cunningham

Abstract: Classic results establish that encouraging predictive diversity improves performance in ensembles of low-capacity models, e.g. through bagging or boosting. Here we demonstrate that these intuitions do not apply to high-capacity neural network ensembles (deep ensembles), and in fact the opposite is often true. In a large scale study of nearly 600 neural network classification ensembles, we examine a variety of interventions that trade off component model performance for predictive diversity. While such interventions can improve the performance of small neural network ensembles (in line with standard intuitions), they harm the performance of the large neural network ensembles most often used in practice. Surprisingly, we also find that discouraging predictive diversity is often benign in large-network ensembles, fully inverting standard intuitions. Even when diversity-promoting interventions do not sacrifice component model performance (e.g. using heterogeneous architectures and training paradigms), we observe an opportunity cost associated with pursuing increased predictive diversity. Examining over 1000 ensembles, we observe that the performance benefits of diverse architectures/training procedures are easily dwarfed by the benefits of simply using higher-capacity models, despite the fact that such higher capacity models often yield significantly less predictive diversity. Overall, our findings demonstrate that standard intuitions around predictive diversity, originally developed for low-capacity ensembles, do not directly apply to modern high-capacity deep ensembles. This work clarifies fundamental challenges to the goal of improving deep ensembles by making them more diverse, while suggesting an alternative path: simply forming ensembles from ever more powerful (and less diverse) component models.

URL: https://openreview.net/forum?id=TQfQUksaC8

---

Title: DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

Authors: Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero-Soriano

Abstract: The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone. Code is available at https://github.com/facebookresearch/DIG-In/.

URL: https://openreview.net/forum?id=FDt2UGM1Nz

---

New submissions
===============

Title: Beyond Regrets: Geometric Metrics for Bayesian Optimization

Abstract: Bayesian optimization is a principled optimization strategy for a black-box objective function. It shows its effectiveness in a wide variety of real-world applications such as scientific discovery and experimental design. In general, the performance of Bayesian optimization is assessed by regret-based metrics such as instantaneous, simple, and cumulative regrets. These metrics only rely on function evaluations, so that they do not consider geometric relationships between query points and global solutions, or query points themselves. Notably, they cannot discriminate if multiple global solutions are successfully found. Moreover, they do not evaluate Bayesian optimization's abilities to exploit and explore a search space given. To tackle these issues, we propose four new geometric metrics, i.e., precision, recall, average degree, and average distance. These metrics allow us to compare Bayesian optimization algorithms considering the geometry of both query points and global optima, or query points. However, they are accompanied by an extra parameter, which needs to be carefully determined. We therefore devise the parameter-free forms of the respective metrics by integrating out the additional parameter. Finally, we empirically validate that our proposed metrics can provide more convincing interpretation and understanding of Bayesian optimization algorithms from distinct perspectives, compared to the conventional metrics.

URL: https://openreview.net/forum?id=NSMMwekIiO

---

Title: HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

Abstract: Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity reconstructions. However, such hierarchical extensions of VQ-VAE often suffer from the codebook/layer collapse issue, where the codebook is not efficiently used to express the data, and hence degrades reconstruction accuracy. To mitigate this problem, we propose a novel unified framework to stochastically learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE), and provides them with a Bayesian training scheme. Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance. We also validated HQ-VAE in terms of its applicability to a different modality with an audio dataset.

URL: https://openreview.net/forum?id=xqAVkqrLjx

---

Title: LeanVec: Search your vectors faster by making them fit

Abstract: Modern deep learning models have the ability to generate high-dimensional vectors whose similarity reflects semantic resemblance.
Thus, similarity search, i.e., the operation of retrieving those vectors in a large collection that are similar to a given query, has become a critical component of a wide range of applications that demand highly accurate and timely answers. In this setting, the high vector dimensionality puts similarity search systems under compute and memory pressure, leading to subpar performance.
Additionally, cross-modal retrieval tasks have become increasingly common, e.g., where a user inputs a text query to find the most relevant images for that query. However, these queries often have different distributions than the database embeddings, making it challenging to achieve high accuracy.
In this work, we present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors while maintaining accuracy.
We present LeanVec variants for in-distribution (ID) and out-of-distribution (OOD) queries. LeanVec-ID yields accuracies on par with those from recently introduced deep learning alternatives whose computational overhead precludes their usage in practice. LeanVec-OOD uses a novel technique for dimensionality reduction that considers the query and database distributions to simultaneously boost the accuracy and the performance of the framework even further (even presenting competitive results when the query and database distributions match). All in all, our extensive and varied experimental results show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time over the state of the art.

URL: https://openreview.net/forum?id=wczqrpOrIc

---

Title: SARI: Simplistic Average and Robust Identification based Noisy Partial Label Learning

Abstract: Partial label learning (PLL) is a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels (partial label), one of which is the true label. Noisy PLL (NPLL) relaxes this constraint by allowing some partial labels to not contain the true label, enhancing the practicality of the problem. Our work centers on NPLL and presents a minimalistic framework called SARI that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm. These pseudo-label and image pairs are then used to train a deep neural network classifier with label smoothing and standard regularization techniques. The classifier's features and predictions are subsequently employed to refine and enhance the accuracy of pseudo-labels. SARI combines the strengths of Average Based Strategies (in pseudo labelling) and Identification Based Strategies (in classifier training) from the literature. We perform thorough experiments on seven datasets and compare SARI against nine NPLL and PLL methods from the prior art. SARI achieves state-of-the-art results in almost all studied settings, obtaining substantial gains in fine-grained classification and extreme noise settings.

URL: https://openreview.net/forum?id=IHLl2908H4

---

Title: Amortized Bayesian Decision Making for simulation-based models

Abstract: Simulation-based inference (SBI) provides a powerful framework for inferring posterior distributions of stochastic simulators in a wide range of domains. In many settings, however, the posterior distribution is not the end goal itself---rather, the derived parameter values and their uncertainties are used as a basis for deciding what actions to take. Unfortunately, because posterior distributions provided by SBI are (potentially crude) approximations of the true posterior, the resulting decisions can be suboptimal. Here, we address the question of how to perform Bayesian decision making on stochastic simulators, and how one can circumvent the need to compute an explicit approximation to the posterior. Our method trains a neural network on simulated data and can predict the expected cost given any data and action, and can, thus, be directly used to infer the action with lowest cost. We apply our method to several benchmark problems and demonstrate that it induces similar cost as the true posterior distribution. We then apply the method to infer optimal actions in a real-world simulator in the medical neurosciences, the Bayesian Virtual Epileptic Patient, and demonstrate that it allows to infer actions associated with low cost after few simulations.

URL: https://openreview.net/forum?id=BQE4MTAfCE

---

Title: Normed Spaces for Graph Embedding

Abstract: Theoretical results from discrete geometry suggest that normed spaces can abstractly embed finite metric spaces with surprisingly low theoretical bounds on distortion in low dimensions.
In this paper, inspired by this theoretical insight, we highlight normed spaces as a more flexible and computationally efficient alternative to several popular Riemannian manifolds for learning graph embeddings.
Normed space embeddings significantly outperform several popular manifolds on a large range of synthetic and real-world graph reconstruction benchmark datasets while requiring significantly fewer computational resources.
We also empirically verify the superiority of normed space embeddings on growing families of graphs associated with negative, zero, and positive curvature, further reinforcing the flexibility of normed spaces in capturing diverse graph structures as graph sizes increase.
Lastly, we demonstrate the utility of normed space embeddings on two applied graph embedding tasks, namely, link prediction and recommender systems.
Our work highlights the potential of normed spaces for geometric graph representation learning, raises new research questions, and offers a valuable tool for experimental mathematics in the field of finite metric space embeddings. We make our code and data publically available\footnote{\url{https://anonymous.4open.science/r/graphs-normed-spaces-90D3/}}.

URL: https://openreview.net/forum?id=4E2XLydJiv

---

Title: Variational Inference with Gaussian Mixture by Entropy Approximation

Abstract: Variational inference is a technique for approximating intractable posterior distributions in order to quantify the uncertainty of machine learning.
Although the unimodal Gaussian distribution is usually chosen as a variational family, it hardly approximates the multimodality. In this paper, we employ the Gaussian mixture distribution as a variational family. A main difficulty of variational inference with the Gaussian mixture is how to approximate the entropy of the Gaussian mixture. We approximate the entropy of the Gaussian mixture as the sum of the entropy of the unimodal Gaussian, which can be analytically calculated. In addition, we theoretically analyze the approximation error between the true entropy and approximated one in order to reveal when our approximation works well. Specifically, the approximation error is controlled by the ratios of the distances between the means to the sum of the variances of the Gaussian mixture. Furthermore, it converges to zero when the ratios go to infinity. This situation seems to be more likely to occur in higher dimensional parametric spaces because of the curse of dimensionality. Therefore, our result guarantees that our approximation works well, for example, in neural networks that assume a large number of weights.

URL: https://openreview.net/forum?id=4QCCBMvA1T

---

Title: Towards Understanding Dual BN In Hybrid Adversarial Training

Abstract: There is a growing concern about applying batch normalization (BN) in adversarial training (AT), especially when the model is trained on both \textit{adversarial} samples and \textit{clean} samples (termed Hybrid-AT). With the assumption that \textit{adversarial} and \textit{clean} samples are from two different domains, a common practice in prior works is to adopt Dual BN, where BN$_{adv}$ and BN$_{clean}$ are used for adversarial and clean branches, respectively. A popular belief for motivating Dual BN is that estimating normalization statistics of this mixture distribution is challenging and thus disentangling it for normalization achieves stronger robustness. In contrast to this belief, we reveal that what makes Dual BN effective mainly lies in its two sets of affine parameters. Moreover, we demonstrate that the domain gap between adversarial and clean samples is not very large, which is counter-intuitive considering the significant influence of adversarial perturbation on the model. We further propose a two-task hypothesis for a better understanding and improvement of Hybrid-AT. Overall, our work sheds new light on understanding the mechanism of Dual BN in Hybrid-AT and its underlying justification.

URL: https://openreview.net/forum?id=bQKHMSE4SH

---

Title: Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions

Abstract: Self-attention models have made great strides toward accurately modeling a wide array of data modalities, including, more recently, graph-structured data. This paper demonstrates that adaptive hierarchical attention can go a long way toward successfully applying transformers to graphs. Our proposed model Sequoia provides a powerful inductive bias towards long-range interaction modeling, leading to better generalization. We propose an end-to-end mechanism for a data-dependent construction of a hierarchy which in turn guides the self-attention mechanism. Using adaptive hierarchy provides a natural pathway toward sparse attention by constraining node-to-node interactions with the immediate family of each node in the hierarchy (e.g., parent, children, and siblings). This in turn dramatically reduces the computational complexity of a self-attention layer from quadratic to log-linear in terms of the input size while maintaining or sometimes even surpassing the standard transformer's ability to model long-range dependencies across the entire input. Experimentally, we report state-of-the-art performance on long-range graph benchmarks while remaining computationally efficient. Moving beyond graphs, we also display competitive performance on long-range sequence modeling, point-clouds classification, and segmentation when using a fixed hierarchy.

URL: https://openreview.net/forum?id=qH4YFMyhce

---

Title: Prototypical Self-Explainable Models Without Re-training

Abstract: Explainable AI (XAI) has unfolded in two distinct research directions with, on the one hand, post-hoc methods that explain the predictions of a pre-trained black-box model and, on the other hand, self-explainable models (SEMs) which are trained directly to provide explanations alongside their predictions. While the latter is preferred in safety-critical scenarios, post-hoc approaches have received the majority of attention until now, owing to their simplicity and ability to explain base models without retraining. Current SEMs instead, require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training. To address this shortcoming and facilitate wider use of SEMs, we propose a simple yet efficient universal method called KMEx (K-Means Explainer), which can convert any existing pre-trained model into a prototypical SEM. The motivation behind KMEx is to push towards more transparent deep learning-based decision-making via class-prototype-based explanations that are guaranteed to be diverse and trustworthy without retraining the base model. We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs\footnote{The code will be made available on Github upon acceptance.}.

URL: https://openreview.net/forum?id=HU5DOUp6Sa

---

Title: Harnessing the Power of Federated Learning in Federated Contextual Bandits

Abstract: Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning. Among many directions, federated contextual bandits (FCB), a pivotal integration of FL and sequential decision-making, has garnered significant attention in recent years. Despite substantial progress, existing FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework. Consequently, even renowned algorithms like FedAvg remain under-utilized in FCB, let alone other FL advancements. Motivated by this disconnection, this work takes one step towards building a tighter relationship between the canonical FL study and the investigations on FCB. In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm, i.e., inverse gap weighting. Compared with existing FCB approaches, the proposed FedIGW design can better harness the entire spectrum of FL innovations, which is concretely reflected as (1) flexible incorporation of (both existing and forthcoming) FL protocols; (2) modularized plug-in of FL analyses in performance guarantees; (3) seamless integration of FL appendages (such as personalization, robustness, and privacy). We substantiate these claims through rigorous theoretical analyses and empirical evaluations.

URL: https://openreview.net/forum?id=Z8wcREe9qV

---

Title: Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations

Abstract: Inverse reinforcement learning (IRL) methods learn a reward function from expert demonstrations such as human behavior, offering a practical solution for crafting reward functions for complex environments. However, IRL is computationally expensive when applied to large populations of demonstrators, as existing IRL algorithms require solving a separate reinforcement learning (RL) problem for each individual. We propose a new IRL approach that relies on contextual RL, where an optimal policy is learned for multiple contexts.
We first learn a contextual policy that provides the RL solution directly for a parametric family of reward functions, and then re-use it for IRL on each individual within the population. We motivate our method within the scenario of AI-driven playtesting of videogames, and focus on an interpretable family of reward functions. We evaluate the method on a navigation task and the battle arena game Derk, where it successfully recovers distinct player reward preferences from a simulated population and provides substantial time savings compared to a solid baseline of adversarial IRL.

URL: https://openreview.net/forum?id=4CUkCG6ITe

---

Title: A note on regularised NTK dynamics with an application to PAC-Bayesian training

Abstract: We establish explicit dynamics for neural networks whose training objective has a regularising term that constrains the parameters to remain close to their initial value. This keeps the network in a lazy training regime, where the dynamics can be linearised around the initialisation. The standard neural tangent kernel (NTK) governs the evolution during the training in the infinite-width limit, although the regularisation yields an additional term appears in the differential equation describing the dynamics. This setting provides an appropriate framework to study the evolution of wide networks trained to optimise generalisation objectives such as PAC-Bayes bounds, and hence potentially contribute to a deeper theoretical understanding of such networks.

URL: https://openreview.net/forum?id=2la55BeWwy

---

Title: EmoAI Smart Classroom: The Development of a Student Emotional and Behavioral Engagement Recognition System

Abstract: Emotions significantly influence the learning environment, impacting student engagement and the overall educational process. With the rise of challenges in maintaining student engagement in large offline classrooms due to various factors, there's an increasing need to detect classroom emotions. This study aims to detect and analyze emotions evoked in classrooms to enhance the educational experience for both educators and learners. Utilizing the DAiSEE dataset, which captures various affective states in offline classroom settings, an engagement detection system was developed using the YOLOV8 state-of-the-art model and deployed on Roboflow.
The system's framework was based on capturing facial cues and was augmented with an interactive interface for lecturers.
Despite its advanced capabilities, the model achieved a precision of 74.5%, a recall of 60.6%, and a mean average precision (mAP) of 65.3%. The findings suggest that while the model offers significant insights, there's potential for further refinement, particularly given the limited frames used for training. The study's interactive interface offers real-time feedback for lecturers, underscoring the intertwined relationship between emotions and learning. Future directions include real-time engagement detection and alert systems, emphasizing the potential to revolutionize classroom dynamics through emotionally attuned educational environments.

URL: https://openreview.net/forum?id=9iNSrUsFuN

---

Title: G-TRACER: Expected Sharpness Optimization

Abstract: We propose a new regularization scheme for the optimization of deep learning architectures, G-TRACER ("Geometric TRACE Ratio"), which promotes generalization by seeking flat minima, and has a sound theoretical basis as an approximation to a natural-gradient descent based optimization of a generalized Bayes objective. By augmenting the loss function with a TRACER, curvature-regularized optimizers (eg SGD-TRACER and Adam-TRACER) are simple to implement as modifications to existing optimizers and don't require extensive tuning. We show that the method converges to a neighborhood (depending on the regularization strength) of a local minimum of the unregularized objective, and demonstrate competitive performance on a number of benchmark computer vision and NLP datasets, with a particular focus on challenging low signal-to-noise ratio problems.

URL: https://openreview.net/forum?id=OBijPYcL9u

---

Title: Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation

Abstract: In prediction tasks, there exist features that are related to the label in the same way across different settings for that task; these are semantic features or semantics. Features with vary- ing relationships to the label are nuisances. For example, in detecting cows from natural images, the shape of the head is semantic, but because images of cows often have grass back- grounds but not always, the background is a nuisance. Models that exploit nuisance-label relationships face performance degradation when these relationships change. Building mod- els robust to such changes requires additional knowledge beyond samples of the features and labels. For example, existing work uses annotations of nuisances or assumes erm-trained models depend on nuisances. Approaches to integrate new kinds of additional knowledge enlarge the settings where robust models can be built. We develop an approach to use knowledge about the semantics via data augmentations. These data augmentations cor- rupt semantic information to produce models that identify and adjust for where nuisances drive predictions. We study semantic corruptions in powering different spurious-correlation- avoiding methods on multiple out-of-distribution (ood) tasks like classifying waterbirds, natural language inference (nli), and detecting cardiomegaly in chest X-rays.

URL: https://openreview.net/forum?id=RIFJsSzwKY

---

Title: Navigating Noise: A Study of How Noise Influences Generalisation and Calibration of Neural Networks

Abstract: Enhancing the generalisation abilities of neural networks (NNs) through integrating noise such as MixUp or Dropout during training has emerged as a powerful and adaptable technique. Despite the proven efficacy of noise in NN training, there is no consensus regarding which noise sources, types and placements yield maximal benefits in generalisation and confidence calibration. This study thoroughly explores diverse noise modalities to evaluate their impacts on NN's generalisation and calibration under in-distribution or out-of-distribution settings, paired with experiments investigating the metric landscapes of the learnt representations, across a spectrum of NN architectures, tasks, and datasets. Our study shows that AugMix and weak augmentation exhibit cross-task effectiveness in computer vision, emphasising the need to tailor noise to specific domains. Our findings emphasise the efficacy of combining noises and successful hyperparameter transfer within a single domain but the difficulties in transferring the benefits to other domains. Furthermore, the study underscores the complexity of simultaneously optimising for both generalisation and calibration, emphasising the need for practitioners to carefully consider noise combinations and hyperparameter tuning for optimal performance in specific tasks and datasets.

URL: https://openreview.net/forum?id=zn3fB4VVF0

---

Title: Revisiting Random Weight Perturbation for Efficiently Improving Generalization

Abstract: Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental challenge in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objective with random weight perturbation (RWP). While RWP offers advantages in computation and is closely linked to AWP on mathematical basis, its empirical performance has consistently lagged behind that of AWP. In this paper, we revisit the use of RWP for improving generalization and propose improvements from two perspectives: convergence and perturbation generation. Through extensive experimental evaluations, we demonstrate that our enhanced RWP methods achieve greater efficiency in enhancing generalization, particularly in large-scale problems, while also offering comparable or even superior performance to SAM through AWP. The code will be released.

URL: https://openreview.net/forum?id=WbbgOHpoPX

---

Title: World Models via Policy-Guided Trajectory Diffusion

Abstract: World models are a powerful tool for developing intelligent agents. By predicting the outcome of a sequence of actions, world models enable policies to be optimised via on-policy reinforcement learning (RL) using synthetic data, i.e. in "in imagination". Existing world models are autoregressive in that they interleave predicting the next state with sampling the next action from the policy. Prediction error inevitably compounds as the trajectory length grows. In this work, we propose a novel world modelling approach that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model. Our approach, Policy-Guided Trajectory Diffusion (PolyGRAD), leverages a denoising model in addition to the gradient of the action distribution of the policy to diffuse a trajectory of initially random states and actions into an on-policy synthetic trajectory. We analyse the connections between PolyGRAD, score-based generative models, and classifier-guided diffusion models. Our results demonstrate that PolyGRAD outperforms state-of-the-art baselines in terms of trajectory prediction error for moderate-length trajectories, with the exception of autoregressive diffusion. At short horizons, PolyGRAD obtains comparable errors to autoregressive diffusion, but with significantly lower computational requirements. Our experiments also demonstrate that PolyGRAD enables performant policies to be trained via on-policy RL in imagination for MuJoCo continuous control domains. Thus, PolyGRAD introduces a new paradigm for scalable and non-autoregressive on-policy world modelling.

URL: https://openreview.net/forum?id=9CcgO0LhKG

---

Title: Data Pruning Can Do More: A Comprehensive Data Pruning Approach for Object Re-identification

Abstract: Previous studies have demonstrated that not each sample in a dataset is of equal importance during training. Data pruning aims to remove less important or informative samples while still achieving comparable results as training on the untruncated dataset, thereby reducing storage and training costs. We present the first data pruning approach in an object re-identification (ReID) setting. By fully leveraging the logit history during training, our approach offers a more accurate and comprehensive metric for quantifying sample importance, as well as correcting mislabeled samples and recognizing outliers. Furthermore, our approach is highly efficient, reducing the importance score estimation cost by 10 times compared to existing methods. Our approach is a plug-and-play, architecture-agnostic framework that can eliminate/reduce 35\%, 30\%, and 5\% of samples/training time on the standard VeRi, MSMT17 and Market1501 datasets, respectively, with negligible loss in accuracy ($<$ 0.1\%). The lists of important, mislabeled, and outlier samples from these ReID datasets will be released upon acceptance.

URL: https://openreview.net/forum?id=vxxi7xzzn7

---

Title: Fooling Contrastive Language-Image Pre-Trained Models with CLIPMasterPrints

Abstract: Models leveraging both visual and textual data such as Contrastive Language-Image Pre-training (CLIP), are the backbone of many recent advances in artificial intelligence. In this work, we show that despite their versatility, such models are vulnerable to what we refer to as fooling master images. Fooling master images are capable of maximizing the confidence score of a CLIP model for a significant number of widely varying prompts, while being either unrecognizable or unrelated to the attacked prompts for humans. We demonstrate how fooling master images can be mined using stochastic gradient descent, projected gradient descent, or gradient-free optimisation. Contrary to many common adversarial attacks, the gradient-free optimisation approach allows us to mine fooling examples even when the weights of the model are not accessible. We investigate the properties of the mined fooling master images, and find that images trained on a small number of image captions potentially generalize to a much larger number of semantically related captions. Finally, we evaluate possible mitigation strategies and find that vulnerability to fooling master examples appears to be closely related to a modality gap in contrastive pre-trained multi-modal networks.

URL: https://openreview.net/forum?id=ZFZnvGXXMm

---

Title: On AI-centered Retrosynthetic Planning: a Survey

Abstract: Retrosynthetic planning is one of the most challenging problems in organic chemistry up to date. It involves designing a target molecule by recursively decomposing it into simpler molecules through a series of backward chemical reaction steps. Typically, these simpler molecules are either commercially available or easy to synthesize. The development of new computer-aided retrosynthetic methods can advance the development of target molecules in drug design, material science and agrochemicals. This paper presents the retrosynthetic planning problem and the current AI-based methods. We conduct an in-depth review of the application of AI-based methods to retrosynthetic planning while also presenting the current challenges and potential future research directions.

URL: https://openreview.net/forum?id=Q2m6H1N2M0

---

Title: Smooth Pseudo-Labeling

Abstract: Semi-Supervised Learning (SSL) seeks to leverage large amounts of non-annotated data
along with the smallest amount possible of annotated data in order to achieve the same level
of performance as if all data were annotated. A fruitful method in SSL is Pseudo-Labeling
(PL), which, however, suffers from the important drawback that the associated loss function
has discontinuities in its derivatives, which cause instabilities in performance when labels
are very scarce. In the present work, we address this drawback with the introduction of a
Smooth Pseudo-Labeling (SPL) loss function. In our experiments, we test our improvements
on FixMatch and show that it significantly improves the performance in the regime of
scarce labels, without addition of any modules, hyperparameters or computational overhead.
Robustness with respect to variation of hyperparameters and training parameters is also
significantly improved. We also introduce a new benchmark, where labeled images are
selected randomly from the whole dataset, without imposing representation of each class
proportional to its frequency in the dataset. We see that the smooth version of FixMatch
does appear to perform better than the original, non-smooth implementation. However,
more importantly, we see that both implementations do not necessarily see their performance
improve when labeled images are added, an important issue in the design of SSL algorithms
that should be addressed so that Active Learning algorithms become more reliable and
explainable.

URL: https://openreview.net/forum?id=49k4PhQQ6E

---

Title: A Sample Efficient Evolutionary Strategy for Reinforcement Learning

Abstract: We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL), through the use of evolutionary operators. The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space. Unlike prior literature on combining evolutionary search (ES) with RL, this work does not generate a distribution of agents from a common mean and covariance matrix. It also does not require the evaluation of the entire population of policies at every time step. Instead, we focus on gradient-based training throughout the life of every policy (individual), with a sparse amount of evolutionary exploration. The resulting algorithm is shown to be robust to hyperparameter variations. As a surprising corollary, we show that simply initialising and training multiple RL agents with a common memory (with no further evolutionary updates) outperforms several standard RL baselines.

URL: https://openreview.net/forum?id=3yXgaJO5BC

---

Title: New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking

Abstract: With the increasing use of large-language models (LLMs) like ChatGPT, watermarking has emerged as a promising approach for tracing machine-generated content. However, research on LLM watermarking often relies on simple perplexity or diversity-based measures to assess the quality of watermarked text, which can mask important limitations in watermarking. Here we introduce two new easy-to-use methods for evaluating watermarking algorithms for LLMs: 1) evaluation by LLM-judger with specific guidelines; and 2) binary classification on text embeddings to distinguish between watermarked and unwatermarked text. We apply these methods to characterize the effectiveness of current watermarking techniques. Our experiments, conducted across various datasets, reveal that current watermarking methods are detectable by even simple classifiers, challenging the notion of watermarking subtlety. We also found, through the LLM judger, that watermarking impacts text quality, especially in degrading the coherence and depth of the response. Our findings underscore the trade-off between watermark robustness and text quality and highlight the importance of having more informative metrics to assess watermarking quality.

URL: https://openreview.net/forum?id=PuhF0hyDq1

---

Title: Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

Abstract: Recent works demonstrate a remarkable ability to customize text-to-image diffusion models while only providing a few example images. What happens if you try to customize such models using multiple, fine-grained concepts in a sequential (i.e., continual) manner? In our work, we show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially. Specifically, when adding a new concept, the ability to generate high quality images of past, similar concepts degrade. To circumvent this forgetting, we propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model. Furthermore, we use customization prompts which do not include the word of the customized object (i.e., "person" for a human face dataset) and are initialized as completely random embeddings. Importantly, our method induces only marginal additional parameter costs and requires no storage of user data for replay. We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, which we refer to as Continual Diffusion, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification. The high achieving performance of C-LoRA in two separate domains positions it as a compelling solution for a wide range of applications, and we believe it has significant potential for practical impact.

URL: https://openreview.net/forum?id=TZdEgwZ6f3

---

Title: The Missing U for Efficient Diffusion Models

Abstract: Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs. In this paper, we introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models that is more parameter-efficient, exhibits faster convergence, and demonstrates increased noise robustness. Experimenting with Denoising Diffusion Probabilistic Models (DDPMs), our framework operates with approximately a quarter of the parameters, and $\sim$ 30\% of the Floating Point Operations (FLOPs) compared to standard U-Nets in DDPMs. Furthermore, our model is notably faster in inference than the baseline when measured in fair and equal conditions. We also provide a mathematical intuition as to why our proposed reverse process is faster as well as a mathematical discussion of the empirical tradeoffs in the denoising downstream task. Finally, we argue that our method is compatible with existing performance enhancement techniques, enabling further improvements in efficiency, quality, and speed.

URL: https://openreview.net/forum?id=Y4YWzBiTEV

---

Title: Adaptive Conformal Regression with Split-Jackknife+ Scores

Abstract: We introduce an extension of conformal predictions (CP) based on a combination of split-CP and the Jackknife+ procedure that enables tuning score functions to calibration data and designed to produce dynamically-sized prediction interval in regression settings.
We motivate this method with theoretical results on distribution-dependent conditional coverage guarantees for split-CP and Jackknife+ prediction sets which are determined by the statistical dependence between input data and prediction scores.
This dependence can be reduced by adapting the score function to the data distribution, thereby improving the conditional validity of conformal prediction sets.
As an illustration, we construct a variant of the MADSplit conformal regression procedure where conditional mean estimates are computed in-distribution and show through empirical validation that our method is more robust to overfitting effects than the original method, while being more sample-efficient than modern ECDF-based methods.

URL: https://openreview.net/forum?id=1fbTGC3BUD

---

Title: RoboArm-NMP: a Learning Environment for Neural Motion Planning

Abstract: We present RoboArm-NMP, a learning and evaluation environment that allows simple and thorough evaluations of Neural Motion Planning (NMP) algorithms, focused on robotic manipulators. Our Python-based environment provides baseline implementations for learning control policies (either supervised or reinforcement learning based), a simulator based on PyBullet, data of solved instances using a classical motion planning solver, various representation learning methods for encoding the obstacles, and a clean interface between the learning and planning frameworks. Using RoboArm-NMP, we compare several prominent NMP design points, and demonstrate that the best methods mostly succeed in generalizing to unseen goals in a scene with fixed obstacles, but have difficulty in generalizing to unseen obstacle configurations, suggesting focus points for future research.

URL: https://openreview.net/forum?id=QTWJVtY4kd

---

Title: Group Fairness in Reinforcement Learning via Multi-Objective Rewards

Abstract: Recent works extend classification group fairness measures to sequential decision processes such as reinforcement learning (RL) by measuring fairness as the difference in decision-maker utility (e.g. accuracy) of each group. This approach suffers when decision-maker utility is not perfectly aligned with group utility, such as in repeat loan applications where a false positive (loan default) impacts the groups (applicants) and decision-maker (lender) by different magnitudes. Some works remedy this by measuring fairness in terms of group utility, typically referred to as their "qualification", but few works offer solutions that yield group qualification equality. Those that do are prone to violating the "no-harm" principle where one or more groups' qualifications are lowered in order to achieve equality. In this work, we characterize this problem space as having three implicit objectives: maximizing decision-maker utility, maximizing group qualification, and minimizing the difference in qualification between groups. We provide a RL policy learning technique that optimizes for these objectives directly by constructing a multi-objective reward function that encodes these objectives as distinct reward signals. Under suitable parameterizations our approach is guaranteed to respect the "no-harm" principle.

URL: https://openreview.net/forum?id=cueEUSG7lE

---

Title: Adaptive Multiple Optimal Learning Factors for Neural Network Training

Abstract: This paper presents the Adapt-MOLF algorithm that merges the strengths of second order algorithms while addressing their limitations. Adapt-MOLF algorithm dynamically adjusts the number of weight groups per hidden unit to maximize error change per multiplication, optimizing computational efficiency. Leveraging curvature-based grouping and Gauss-Newton updates, it efficiently interpolates the Hessian and negative gradients for computation. The two-stage algorithm alternately determines output weights and employs multiple learning factors to train input weights in a Multi-Layer Perceptron. This adaptive adjustment of learning factors maximizes the error decrease per multiplication, showcasing superior performance over OWO-MOLF and Levenberg Marquardt (LM) across diverse datasets. Extensive experiments demonstrate its competitive or superior results compared to state-of-the-art algorithms particularly excelling in reducing testing errors. This research represents a promising advancement in second-order optimization methods for neural network training, offering scalability, efficiency, and superior performance across heterogeneous datasets.

URL: https://openreview.net/forum?id=NQyuYUa3So

---

Title: Task-Relevant Feature Selection with Prediction Focused Mixture Models

Abstract: Probabilistic models, such as mixture models, can encode latent structures that both explain the data and aid specific downstream tasks.
We focus on a constrained setting where we want to learn a model with relatively few components (e.g. for interpretability).
Simultaneously, we ensure that the components are useful for downstream predictions by introducing \emph{prediction-focused} modeling for mixtures, which automatically selects data features relevant to a prediction task.
Our approach identifies task-relevant input features, outperforms models that are not prediction-focused, and is easy to optimize; most importantly, we also characterize \emph{when} prediction-focused modeling can be expected to work.

URL: https://openreview.net/forum?id=voHKJOdCNw

---

Title: Guided Safe Shooting: model based reinforcement learning with safety constraints

Abstract: In the last decade, reinforcement learning has successfully solved complex control tasks and decision-making problems, such as the Go board game. Yet, there have been few success stories in deploying these algorithms to real-world scenarios. One of the reasons is the lack of guarantees when dealing with and avoiding unsafe states, a fundamental requirement in critical control engineering systems. In this paper, we introduce Guided Safe Shooting (GuSS), a model-based reinforcement learning approach that can learn to control systems with minimal violations of the safety constraints through a MAP-Elites based planner.
Experiments show that the new planner helps the agent avoid unsafe situations while maximally exploring the state space, a necessary aspect when learning an accurate model of the system.

URL: https://openreview.net/forum?id=5pDGecWGeN

---

Title: Test-time recalibration of conformal predictors under distribution shift based on unlabeled examples

Abstract: Modern image classifiers are very accurate, but the predictions come without uncertainty estimates. Conformal predictors provide uncertainty estimates by computing a set of classes containing the correct class with a user-specified probability based on the classifier's probability estimates. To provide such sets, conformal predictors often estimate a cutoff threshold for the probability estimates based on a calibration set. Conformal predictors guarantee reliability only when the calibration set is from the same distribution as the test set. Therefore, conformal predictors need to be recalibrated for new distributions. However, in practice, labeled data from new distributions is rarely available, making calibration infeasible. In this work, we consider the problem of predicting the cutoff threshold for a new distribution based on unlabeled examples. While it is impossible in general to guarantee reliability when calibrating based on unlabeled examples, we propose a method that provides excellent uncertainty estimates under natural distribution shifts, and provably works for a specific model of a distribution shift.

URL: https://openreview.net/forum?id=krQIuCCQsW

---

Title: Loc-FACMAC: Locality Based Factorized Multi-Agent Actor- Critic Algorithm for Cooperative Tasks

Abstract: In this work, we present a novel cooperative multi-agent reinforcement learning method
called Locality based Factorized Multi-Agent Actor-Critic (Loc-FACMAC). The existing
state-of-the-art algorithms, such as FACMAC, rely on the global reward information for critic
training. However, in a distributed multi-agent system, the global reward is overgeneralized.
The global reward cannot accurately reflect the influence of individual agents’ actions,
resulting in the mixer’s poor performance in assigning credit. We introduce the idea of
locality into critic learning to connect the strongly related agents into partitions. Agents
in the same partition have a more significant impact retained within the partition itself.
Thus, agents learning from the local reward can provide a more precise evaluation of the
policy. This technique prevents the agent using information from unrelated agents and
also helps to deal with the curse of dimensionality due to multiple agents. Loc-FACMAC
further improves the efficiency of learning by introducing locality to the actor update as well.
We evaluate the performance of Loc-FACMAC on three environments: Multi-cartpole, the
StarCraft Multi-Agent Challenge, and Bounded-Cooperative-Navigation. We explore the
impact of partition sizes on the performance and compare the result with baseline MARL
algorithms such as LOMAQ, FACMAC, and QMIX. The experiments reveal that, if the
locality structure is defined properly, Loc-FACMAC outperforms these baseline algorithms
up to 45% , indicating that exploiting the locality structure in the actor-critic framework
improves the MARL performance.

URL: https://openreview.net/forum?id=w44hcJssCe

---

Title: Vision-Language Instruction Tuning: A Review and Analysis

Abstract: Instruction tuning is a crucial supervised training phase in Large Language Models (LLMs), aiming to enhance the LLM's ability to generalize instruction execution and adapt to user preferences. With the increasing integration of multi-modal data into LLMs, there is growing interest in Vision-Language Instruction Tuning (VLIT), which presents more complex characteristics compared to pure text instruction tuning. In this paper, we systematically review the latest VLIT settings and corresponding datasets in multi-modal LLMs and provide insights into the intrinsic motivations behind their design. For the first time, we offer a detailed multi-perspective categorization for existing VLIT datasets and identify the characteristics that high-quality VLIT data should possess. By incorporating these characteristics as guiding principles into the existing VLIT data construction process, we conduct extensive experiments and verify their positive impact on the performance of tuned multi-modal LLMs. Furthermore, we discuss the current challenges and future research directions of VLIT, providing insights for the continuous development of this field. The code and dataset related to this paper have been open-sourced at URL\footnote{Anonymous during the review stage.}.

URL: https://openreview.net/forum?id=ul2tbUPtIQ

---

Reply all

Reply to author

Forward

0 new messages