Weekly TMLR digest for Jul 21, 2024

1 view
Skip to first unread message

TMLR

unread,
Jul 21, 2024, 12:00:10 AM (6 days ago) Jul 21
to tmlr-annou...@googlegroups.com


New certifications
==================

Survey Certification: Vision-Language Instruction Tuning: A Review and Analysis

Chen Li, Yixiao Ge, Dian Li, Ying Shan

https://openreview.net/forum?id=ul2tbUPtIQ

---


Accepted papers
===============


Title: Augment then Smooth: Reconciling Differential Privacy with Certified Robustness

Authors: Jiapeng Wu, Atiyeh Ashari Ghomi, David Glukhov, Jesse C. Cresswell, Franziska Boenisch, Nicolas Papernot

Abstract: Machine learning models are susceptible to a variety of attacks that can erode trust, including attacks against the privacy of training data, and adversarial examples that jeopardize model accuracy. Differential privacy and certified robustness are effective frameworks for combating these two threats respectively, as they each provide future-proof guarantees. However, we show that standard differentially private model training is insufficient for providing strong certified robustness guarantees. Indeed, combining differential privacy and certified robustness in a single system is non-trivial, leading previous works to introduce complex training schemes that lack flexibility. In this work, we present DP-CERT, a simple and effective method that achieves both privacy and robustness guarantees simultaneously by integrating randomized smoothing into standard differentially private model training. Compared to the leading prior work, DP-CERT gives up to a 2.5x increase in certified accuracy for the same differential privacy guarantee on CIFAR10. Through in-depth per-sample metric analysis, we find that larger certifiable radii correlate with smaller local Lipschitz constants, and show that DP-CERT effectively reduces Lipschitz constants compared to other differentially private training methods. Code is available at github.com/layer6ai-labs/dp-cert.

URL: https://openreview.net/forum?id=YN0IcnXqsr

---

Title: Deep Unlearning: Fast and Efficient Gradient-free Class Forgetting

Authors: Sangamesh Kodge, Gobinda Saha, Kaushik Roy

Abstract: Machine unlearning is a prominent and challenging field, driven by regulatory demands for user data deletion and heightened privacy awareness. Existing approaches involve retraining model or multiple finetuning steps for each deletion request, often constrained by computational limits and restricted data access. In this work, we introduce a novel class unlearning algorithm designed to strategically eliminate specific classes from the learned model. Our algorithm first estimates the Retain and the Forget Spaces using Singular Value Decomposition on the layerwise activations for a small subset of samples from the retain and unlearn classes, respectively. We then compute the shared information between these spaces and remove it from the forget space to isolate class-discriminatory feature space. Finally, we obtain the unlearned model by updating the weights to suppress the class discriminatory features from the activation spaces. We demonstrate our algorithm's efficacy on ImageNet using a Vision Transformer with only ~1.5% drop in retain accuracy compared to the original model while maintaining under 1% accuracy on the unlearned class samples. Furthermore, our algorithm exhibits competitive unlearning performance and resilience against Membership Inference Attacks (MIA). Compared to baselines, it achieves an average accuracy improvement of 1.38% on the ImageNet dataset while requiring up to 10x fewer samples for unlearning. Additionally, under stronger MIA attacks on the CIFAR-100 dataset using a ResNet18 architecture, our approach outperforms the best baseline by 1.8%.

URL: https://openreview.net/forum?id=BmI5p6wBi0

---

Title: Dual-windowed Vision Transformer with Angular Self- Attention

Authors: Weili Shi, Sheng Li

Abstract: Following the great success in natural language processing, transformer-based models have emerged as the competitive model against the convolutional neural networks in computer vision. Vision transformer (ViT) and its subsequent variants have exhibited promising performance in tasks such as image classification, object detection and semantic segmentation. The core of vision transformers is the self-attention mechanism, which models the long-range dependency of different tokens. Conventionally, the attention matrix in self-attention is calculated by the scaled dot-product of \textit{query} (Q) and \textit{key} (K). In this case, the attention weight would depend on norm of Q and K as well as the angle between them. In this paper, we propose a new attention mechanism named angular self-attention, which replaces the scaled dot-product operation with the angular function in order to effectively model the relationship between tokens. In particular, we propose two forms of functions: quadratic and cosine functions, for our angular self-attention. Based on angular self-attention, we design a new vision transformer architecture called dual-windowed angular vision transformer (\textbf{DWAViT}). DWAViT is a hierarchical-structured model characterized by the angular self-attention and a new local window mechanism. We evaluate DWAViT on multiple computer vision benchmarks, including image classification on ImageNet-1K, object detection on COCO, and semantic segmentation on ADE20K. Our experimental results also suggest that our model can achieve promising performance on the tasks while maintaining comparable computational cost with that of the baseline models (e.g., Swin Transformer).

URL: https://openreview.net/forum?id=7jgu4oXsGM

---

Title: On the Importance of Uncertainty in Decision-Making with Large Language Models

Authors: Nicolò Felicioni, Lucas Maystre, Sina Ghiassian, Kamil Ciosek

Abstract: We investigate the role of uncertainty in decision-making problems with natural language as input. For such tasks, using Large Language Models as agents has become the norm. However, none of the recent approaches employ any additional phase for estimating the uncertainty the agent has about the world during the decision-making task. We focus on a fundamental decision-making framework with natural language as input, which is the one of contextual bandits, where the context information consists of text. As a representative of the approaches with no uncertainty estimation, we consider an LLM agent with a greedy policy, which picks the action corresponding to the largest predicted reward. We compare this baseline to LLM agents that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy. We employ different techniques for uncertainty estimation, such as Laplace Approximation, Dropout, and Epinets. We empirically show on real-world data that the greedy policy performs worse than the Thompson Sampling policies. These findings suggest that, while overlooked in the LLM literature, uncertainty improves performance on bandit tasks with LLM agents.

URL: https://openreview.net/forum?id=YfPzUX6DdO

---

Title: ITEM: Improving Training and Evaluation of Message-Passing based GNNs for top-k recommendation

Authors: Yannis Karmim, Elias Ramzi, Raphael Fournier-S'niehotta, Nicolas THOME

Abstract: Graph Neural Networks (GNNs), especially message-passing-based models, have become prominent in top-k recommendation tasks, outperforming matrix factorization models due to their ability to efficiently aggregate information from a broader context.
Although GNNs are evaluated with ranking-based metrics, e.g NDCG@k and Recall@k, they remain largely trained with proxy losses, e.g the BPR loss. In this work we explore the use of ranking loss functions to directly optimize the evaluation metrics, an area not extensively investigated in the GNN community for collaborative filtering.
We take advantage of smooth approximations of the rank to facilitate end-to-end training of GNNs and propose a Personalized PageRank-based negative sampling strategy tailored for ranking loss functions. Moreover, we extend the evaluation of GNN models for top-k recommendation tasks with an inductive user-centric protocol, providing a more accurate reflection of real-world applications.
Our proposed method significantly outperforms the standard BPR loss and more advanced losses across four datasets and four recent GNN architectures while also exhibiting faster training. Demonstrating the potential of ranking loss functions in improving GNN training for collaborative filtering tasks.

URL: https://openreview.net/forum?id=9B6LM2uoEs

---

Title: Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition

Authors: Roisin Luo, James McDermott, Colm O'Riordan

Abstract: Perturbation robustness evaluates the vulnerabilities of models, arising from a variety of perturbations, such as data corruptions and adversarial attacks. Understanding the mechanisms of perturbation robustness is critical for global interpretability. We present a model-agnostic, global mechanistic interpretability method to interpret the perturbation robustness of image models. This research is motivated by two key aspects. First, previous global interpretability works, in tandem with robustness benchmarks, *eg.* mean corruption error (mCE), are not designed to directly interpret the mechanisms of perturbation robustness within image models. Second, we notice that the spectral signal-to-noise ratios (SNR) of perturbed natural images exponentially decay over the frequency. This power-law-like decay implies that: Low-frequency signals are generally more robust than high-frequency signals -- yet high classification accuracy can not be achieved by low-frequency signals alone. By applying Shapley value theory, our method axiomatically quantifies the predictive powers of robust features and non-robust features within an information theory framework. Our method, dubbed as **I-ASIDE** (**I**mage **A**xiomatic **S**pectral **I**mportance **D**ecomposition **E**xplanation), provides a unique insight into model robustness mechanisms. We conduct extensive experiments over a variety of vision models pre-trained on ImageNet, including both convolutional neural networks (*eg.* *AlexNet*, *VGG*, *GoogLeNet/Inception-v1*, *Inception-v3*, *ResNet*, *SqueezeNet*, *RegNet*, *MnasNet*, *MobileNet*, *EfficientNet*, *etc.*) and vision transformers (*eg.* *ViT*, *Swin Transformer*, and *MaxViT*), to show that **I-ASIDE** can not only **measure** the perturbation robustness but also **provide interpretations** of its mechanisms.

URL: https://openreview.net/forum?id=uQYomAuo7M

---

Title: DeepReShape: Redesigning Neural Networks for Efficient Private Inference

Authors: Nandan Kumar Jha, Brandon Reagen

Abstract: Prior work on Private Inference (PI)---inferences performed directly on encrypted input---has focused on minimizing a network's ReLUs, which have been assumed to dominate PI latency rather than FLOPs. Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties. In this paper, we develop DeepReShape, a technique that optimizes neural network architectures under PI's constraints, optimizing for both ReLUs {\em and} FLOPs for the first time. The key insight is strategically allocating channels to position the network's ReLUs in order of their criticality to network accuracy, simultaneously optimizes ReLU and FLOPs efficiency. DeepReShape automates network development with an efficient process, and we call generated networks HybReNets. We evaluate DeepReShape using standard PI benchmarks and demonstrate a 2.1% accuracy gain with a 5.2$\times$ runtime improvement at iso-ReLU on CIFAR-100 and an 8.7$\times$ runtime improvement at iso-accuracy on TinyImageNet. Furthermore, we investigate the significance of network selection in prior ReLU optimizations and shed light on the key network attributes for superior PI performance.

URL: https://openreview.net/forum?id=iwCBWULItx

---

Title: Exploiting Edge Features in Graph-based Learning with Fused Network Gromov-Wasserstein Distance

Authors: Junjie Yang, Matthieu Labeau, Florence d'Alché-Buc

Abstract: Pairwise comparison of graphs is key to many applications in Machine Learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them, namely the Gromov-Wasserstein distance and its variant the Fused Gromov-Wasserstein that applies on node attributed graphs. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of the Fused Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We present a range of studies that illustrate the properties of the proposed distance and empirically demonstrate its effectiveness in supervised graph prediction tasks.

URL: https://openreview.net/forum?id=8uCNtJ2Fmo

---

Title: Mini-Batch Optimization of Contrastive Loss

Authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

Abstract: Contrastive learning has gained significant attention as a pre-training method for self-supervised learning due to its ability to leverage large amounts of unlabeled data. A contrastive loss function ensures that embeddings of positive sample pairs (e.g., from the same class or different views of the same data) are similar, while embeddings of negative pairs are dissimilar. However, practical constraints such as large memory requirements make it infeasible to consider all possible positive and negative pairs, leading to the use of mini-batches. In this paper, we investigate the theoretical aspects of mini-batch optimization in contrastive learning with the InfoNCE loss. We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $\binom{N}{B}$ mini-batches are selected, while sub-optimality may arise when examining only a subset. We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches. Our experimental results validate our theoretical findings and demonstrate that our proposed algorithm outperforms vanilla SGD, providing a better understanding of mini-batch optimization in contrastive learning.

URL: https://openreview.net/forum?id=Nux7OVXpJ9

---

Title: Improved motif-scaffolding with SE(3) flow matching

Authors: Jason Yim, Andrew Campbell, Emile Mathieu, Andrew Y. K. Foong, Michael Gastegger, Jose Jimenez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Frank Noe, Regina Barzilay, Tommi Jaakkola

Abstract: Protein design often begins with the knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a range of motifs. However, generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow without additional training. On a benchmark of 24 biologically meaningful motifs, we show our method achieves 2.5 times more designable and unique motif-scaffolds compared to state-of-the-art. Code: https://github.com/microsoft/protein-frame-flow

URL: https://openreview.net/forum?id=fa1ne8xDGn

---

Title: XAudit : A Learning-Theoretic Look at Auditing with Explanations

Authors: Chhavi Yadav, Michal Moshkovitz, Kamalika Chaudhuri

Abstract: Responsible use of machine learning requires models to be audited for undesirable properties. While a body of work has proposed using explanations for auditing, how to do so and why has remained relatively ill-understood. This work formalizes the role of explanations in auditing using inspirations from active learning and investigates if and how model explanations can help audits. As an instantiation of our framework, we look at `feature sensitivity' and propose explanation-based algorithms for auditing linear classifiers and decision trees for this property. Our results illustrate that Counterfactual explanations are extremely helpful for auditing feature sensitivity, even in the worst-case. While Anchor explanations and decision paths may not be as beneficial in the worst-case, in the average-case they do aid significantly as demonstrated by our experiments.

URL: https://openreview.net/forum?id=gPtjyzXskg

---

Title: Fair Representation in Submodular Subset Selection: A Pareto Optimization Approach

Authors: Adriano Fazzone, Yanhao Wang, Francesco Bonchi

Abstract: Many machine learning applications, such as feature selection, recommendation, and social advertising, require the joint optimization of the global utility and the representativeness for different groups of items or users. To meet such requirements, we propose a novel multi-objective combinatorial optimization problem called Submodular Maximization with Fair Representation (SMFR), which selects subsets from a ground set, subject to a knapsack or matroid constraint, to maximize a submodular (utility) function $f$ as well as a set of $d$ submodular (representativeness) functions $g_1, \dots, g_d$. We show that the maximization of $f$ might conflict with the maximization of $g_1, \dots, g_d$, so that no single solution can optimize all these objectives at the same time. Therefore, we propose a Pareto optimization approach to SMFR, which finds a set of solutions to approximate all Pareto-optimal solutions with different trade-offs between the objectives. Our method converts an instance of SMFR into several submodular cover instances by adjusting the weights of the objective functions; then it computes a set of solutions by running the greedy algorithm on each submodular cover instance. We prove that our method provides approximation guarantees for SMFR under knapsack or matroid constraints. Finally, we demonstrate the effectiveness of SMFR and our proposed approach in two real-world problems: maximum coverage and recommendation.

URL: https://openreview.net/forum?id=0Hm01Vc8zT

---

Title: Representation Learning Dynamics of Self-Supervised Models

Authors: Pascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar

Abstract: Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.

URL: https://openreview.net/forum?id=QXLKnrymE1

---

Title: Learning Counterfactually Invariant Predictors

Authors: Francesco Quinzan, Cecilia Casolo, Krikamol Muandet, Yucen Luo, Niki Kilbertus

Abstract: Notions of counterfactual invariance (CI) have proven essential for predictors that are fair, robust, and generalizable in the real world. We propose graphical criteria that yield a sufficient condition for a predictor to be counterfactually invariant in terms of a conditional independence in the observational distribution. In order to learn such predictors, we propose a model-agnostic framework, called Counterfactually Invariant Prediction (CIP), building on the Hilbert-Schmidt Conditional Independence Criterion (HSCIC), a kernel-based conditional dependence measure. Our experimental results demonstrate the effectiveness of CIP in enforcing counterfactual invariance across various simulated and real-world datasets including scalar and multi-variate settings.

URL: https://openreview.net/forum?id=pRt1Vw1DPs

---

Title: A Note on the Convergence of Denoising Diffusion Probabilistic Models

Authors: Sokhna Diarra Mbacke, Omar Rivasplata

Abstract: Diffusion models are one of the most important families of deep generative models. In this note, we derive a quantitative upper bound on the Wasserstein distance between the target distribution and the distribution learned by a diffusion model. Unlike previous works on this topic, our result does not make assumptions on the learned score function. Moreover, our result holds for arbitrary data-generating distributions on bounded instance spaces, even those without a density with respect to Lebesgue measure, and the upper bound does not suffer from exponential dependencies on the ambient space dimension. Our main result builds upon the recent work of Mbacke et al. (2023) and our proofs are elementary.

URL: https://openreview.net/forum?id=wLe1bG93yc

---

Title: Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Authors: Tong Xie, Haoyu Li, Andrew Bai, Cho-Jui Hsieh

Abstract: Data attribution methods trace model behavior back to its training dataset, offering an effective approach to better understand ``black-box'' neural networks. While prior research established quantifiable links between model output and training data in diverse settings, interpreting diffusion model outputs in relation to training samples remains underexplored. In particular, diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts, posing a significant challenge to extend existing frameworks to diffusion models directly. Notably, we present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep. This trend leads to a prominent bias in influence estimation, and is particularly severe for samples trained on large-norm-inducing timesteps, causing them to be generally influential. To mitigate this bias, we introduce Diffusion-ReTrac as a re-normalized adaptation that retrieves training samples targeted to the test sample of interest, enabling a localized measurement of influence and considerably more intuitive visualization. We demonstrate the efficacy of our approach through various evaluation metrics and auxiliary tasks, outperforming in terms of specificity of attribution by over $60\%$.

URL: https://openreview.net/forum?id=P3Lyun7CZs

---

Title: Language Models Are Better Than Humans at Next-token Prediction

Authors: Buck Shlegeris, Fabien Roger, Lawrence Chan, Euan McLean

Abstract: Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, causal language models are not trained to perform well at these tasks; they are trained to accurately predict the next token given previous tokens in tokenized text. It is not clear whether language models are better or worse than humans at next-token prediction. To try to answer this question, we performed two distinct experiments to directly compare humans and language models on this front: one measuring top-1 accuracy and the other measuring perplexity on OpenWebText. In both experiments, we find humans to be consistently \emph{worse} than relatively small language models like GPT-Neo-1.3B or GPT-2-large at next-token prediction.

URL: https://openreview.net/forum?id=RNsnSLdmV7

---

Title: Vision-Language Instruction Tuning: A Review and Analysis

Authors: Chen Li, Yixiao Ge, Dian Li, Ying Shan

Abstract: Instruction tuning is a crucial supervised training phase in Large Language Models (LLMs), aiming to enhance the LLM's ability to generalize instruction execution and adapt to user preferences. With the increasing integration of multi-modal data into LLMs, there is growing interest in Vision-Language Instruction Tuning (VLIT), which presents more complex characteristics compared to pure text instruction tuning. In this paper, we systematically review the latest VLIT settings and corresponding datasets in multi-modal LLMs and provide insights into the intrinsic motivations behind their design. For the first time, we offer a detailed multi-perspective categorization for existing VLIT datasets and identify the characteristics that high-quality VLIT data should possess. By incorporating these characteristics as guiding principles into the existing VLIT data construction process, we conduct extensive experiments and verify their positive impact on the performance of tuned multi-modal LLMs. Furthermore, we discuss the current challenges and future research directions of VLIT, providing insights for the continuous development of this field. The code and dataset related to this paper have been open-sourced at \url{https://github.com/palchenli/VL-Instruction-Tuning}.

URL: https://openreview.net/forum?id=ul2tbUPtIQ

---

Title: Harnessing the Power of Federated Learning in Federated Contextual Bandits

Authors: Chengshuai Shi, Ruida Zhou, Kun Yang, Cong Shen

Abstract: Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning. Among many directions, federated contextual bandits (FCB), a pivotal integration of FL and sequential decision-making, has garnered significant attention in recent years. Despite substantial progress, existing FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework. Consequently, even renowned algorithms like FedAvg remain under-utilized in FCB, let alone other FL advancements. Motivated by this disconnection, this work takes one step towards building a tighter relationship between the canonical FL study and the investigations on FCB. In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm, i.e., inverse gap weighting. Compared with existing FCB approaches, the proposed FedIGW design can better harness the entire spectrum of FL innovations, which is concretely reflected as (1) flexible incorporation of (both existing and forthcoming) FL protocols; (2) modularized plug-in of FL analyses in performance guarantees; (3) seamless integration of FL appendages (such as personalization, robustness, and privacy). We substantiate these claims through rigorous theoretical analyses and empirical evaluations.

URL: https://openreview.net/forum?id=Z8wcREe9qV

---

Title: Diversity-Preserving $K$--Armed Bandits, Revisited

Authors: Hedi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz

Abstract: We consider the bandit-based framework for diversity-preserving recommendations introduced by Celis et al. (2019), who approached it in the case of a polytope mainly by a reduction to the setting of linear bandits. We design a UCB algorithm using the specific structure of the setting and show that it enjoys a bounded distribution-dependent regret in the natural cases when the optimal mixed actions put some probability mass on all actions (i.e., when diversity is desirable). The regret lower bounds provided show that otherwise, at least when the model is mean-unbounded, a $\ln T$ regret is suffered. We also discuss an example beyond the special case of polytopes.

URL: https://openreview.net/forum?id=Viz7KBqO4A

---

Title: Masked multi-prediction for multi-aspect anomaly detection

Authors: Yassine Naji, Romaric Audigier, Aleksandr Setkov, Angelique Loesch, Michèle Gouiffès

Abstract: In this paper, we address the anomaly detection problem in the context of heterogeneous normal observations and propose an approach that accounts for this heterogeneity. Although prediction-based methods are common to learn normality, the vast majority of previous work predicts a single outcome, which is generally not sufficient to account for the multiplicity of possible normal observations. To address this issue, we introduce a new masked multi-prediction (MMP) approach that produces multiple likely normal outcomes, and show both theoretically and experimentally that it improves normality learning and leads to a better anomaly detection performance. In addition, we observed that normality can be characterized from multiple aspects, depending on the types of anomalies to be detected. Therefore, we propose an adaptation (MMP-AMS) of our approach to cover multiple aspects of normality such as appearance, motion, semantics and location. Since we model each aspect separately, our approach has the advantage of being interpretable and modular, as we can select only a subset of normality aspects. The experiments conducted on several benchmarks show the effectiveness of the proposed approach.

URL: https://openreview.net/forum?id=7wybYcK1pw

---


New submissions
===============


Title: Learning to Cooperate under Private Rewards

Abstract: We address a critical challenge in multi-agent reinforcement learning (MARL): maximizing team rewards in scenarios where agents only have access to their individual, private rewards. This setting presents unique challenges, as agents must cooperate to optimize collective performance whilst having only local, potentially conflicting objectives. Existing MARL methods often tackle this by sharing rewards, values, or full policies, but these approaches raise concerns about privacy and computational overhead. We introduce Anticipation Sharing (AS), a novel MARL method that achieves team-level coordination through the exchange of anticipated peer action distributions. Our key theoretical contribution is a proof that the deviation between the collective return and individual objectives can be identified through these anticipations. This allows AS to align agent behaviours towards team objectives without compromising individual privacy or incurring the prohibitive costs of full policy sharing. Experimental results demonstrate that AS is competitive with baseline algorithms that share values or policy parameters, whilst offering significant advantages in privacy preservation and computational efficiency. Our work presents a promising direction for privacy-preserving cooperative MARL in scenarios where agents must maximize team performance using only their private, individual rewards.

URL: https://openreview.net/forum?id=EdGIYb2rtU

---

Title: The Foundation Model Transparency Index v1.1: May 2024

Abstract: Foundation models are increasingly consequential yet extremely opaque. To characterize the status quo, the Foundation Model Transparency Index was launched in October 2023 to measure the transparency of leading foundation model developers. The October 2023 Index (v1.0) assessed 10 major foundation model developers (e.g. OpenAI, Google) on 100 transparency indicators (e.g. does the developer disclose the wages it pays for data labor?). At the time, developers publicly disclosed very limited information with the average score being 37 out of 100. To understand how the status quo has changed, we conduct a follow-up study (v1.1) after 6 months: we score 14 developers against the same 100 indicators. While in v1.0 we searched for publicly available information, in v1.1 developers submit reports on the 100 transparency indicators, potentially including information that was not previously public. We find that developers now score 58 out of 100 on average, a 21 point improvement over v1.0. Much of this increase is driven by developers disclosing information during the v1.1 process: on average, developers disclosed information related to 16.6 indicators that was not previously public. We observe regions of sustained (i.e. across v1.0 and v1.1) and systemic (i.e. across most or all developers) opacity such as on copyright status, data access, data labor, and downstream impact. We publish transparency reports for each developer that consolidate information disclosures: these reports are based on the information disclosed to us via developers. Our findings demonstrate that transparency can be improved in this nascent ecosystem, the Foundation Model Transparency Index likely contributes to these improvements, and policymakers should consider interventions in areas where transparency has not improved.

URL: https://openreview.net/forum?id=38cwP8xVxD

---

Title: LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations

Abstract: Contrastive instance discrimination approaches outperform supervised learning in downstream tasks like image classification and object detection. However, these approaches heavily rely on data augmentation during representation learning, which may result in inferior results if not properly implemented. Random cropping followed by resizing is a common form of data augmentation used in contrastive learning, but it can lead to degraded representation learning if the two random crops contain distinct semantic content. To address this issue, this paper introduces LeOCLR (Leveraging Original Images for Contrastive Learning of Visual Representations), a framework that employs a new instance discrimination approach and an adapted loss function to alleviate discarding semantic features caused by mapping different object parts during representation learning. The experimental results show that our approach consistently improves representation learning across different datasets compared to baseline models. For example, our approach outperforms MoCo-v2 by 5.1% on ImageNet-1K in linear evaluation and several other methods on transfer learning tasks.

URL: https://openreview.net/forum?id=y8qGOvUn1r

---

Title: Continual Learning in Open-vocabulary Classification with Complementary Memory Systems

Abstract: We introduce a method for flexible and efficient continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition. Specifically, we propose to combine predictions from a CLIP zero-shot model and the exemplar-based model, using the zero-shot estimated probability that a sample's class is within the exemplar classes. We also propose a ``tree probe'' method, an adaption of lazy learning principles, which enables fast learning from new examples with competitive accuracy to batch-trained linear models. We test in data incremental, class incremental, and task incremental settings, as well as ability to perform flexible inference on varying subsets of zero-shot and learned categories. Our proposed method achieves a good balance of learning speed, target task effectiveness, and zero-shot effectiveness.

URL: https://openreview.net/forum?id=6j5M75iK3a

---

Title: How Low Can You Go? Identifying Prototypical In-Distribution Samples for Unsupervised Anomaly Detection

Abstract: Unsupervised anomaly detection (UAD) alleviates large labeling efforts by training exclusively on unlabeled in-distribution data and detecting outliers as anomalies. Generally, the assumption prevails that large training datasets allow the training of higher-performing UAD models. However, in this work, we show that UAD with extremely few training samples can already match -- and in some cases even surpass -- the performance of training with the whole training dataset. Building upon this finding, we propose an unsupervised method to reliably identify prototypical samples to further boost UAD performance. We demonstrate the utility of our method on seven different established UAD benchmarks from computer vision, industrial defect detection, and medicine. With just 25 selected samples, we even exceed the performance of full training in $25/67$ categories in these benchmarks. Additionally, we show that the prototypical in-distribution samples identified by our proposed method generalize well across models and datasets and that observing their sample selection criteria allows for a successful manual selection of small subsets of high-performing samples. Our code is available at https://anonymous.4open.science/r/uad_prototypical_samples/

URL: https://openreview.net/forum?id=NS9rcQlcQ1

---

Title: Repositioning the Subject within Image

Abstract: Current image manipulation primarily centers on static manipulation, such as replacing specific regions within an image or altering its overall style. In this paper, we introduce an innovative dynamic manipulation task, subject repositioning. This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity. Our research reveals that the fundamental sub-tasks of subject repositioning, which include filling the void left by the repositioned subject, reconstructing obscured portions of the subject and blending the subject to be consistent with surrounding areas, can be effectively reformulated as a unified, prompt-guided inpainting task. Consequently, we can employ a single diffusion generative model to address these sub-tasks using various task prompts learned through our proposed task inversion technique. Additionally, we integrate pre-processing and post-processing techniques to further enhance the quality of subject repositioning. These elements together form our SEgment-gEnerate-and-bLEnd (SEELE) framework. To assess SEELE's effectiveness in subject repositioning, we assemble a real-world subject repositioning dataset called ReS. Results of SEELE on ReS demonstrate its efficacy.

URL: https://openreview.net/forum?id=orHH4fCtR8

---

Title: Offline Reinforcement Learning with Bayesian Flow Networks

Abstract: This paper presents a novel approach to reinforcement learning (RL) utilizing Bayesian flow networks for sequence generation, enabling effective planning in both discrete and continuous domains by conditioning on returns and current states. We explore two conditioning strategies: state inpainting and a classifier-free method. Experimental results demonstrate the robustness of our method across various environments. It adeptly navigated gridworld environments in discrete settings, without sacrificing performance in continuous tasks compared to current state of the art . The results highlight our approach's ability to effectively capture spatial and temporal dependencies through a specialized neural network architecture combining 2D convolutions with a temporal u-net.

URL: https://openreview.net/forum?id=egy21TeQHA

---

Title: tsGT: Time Series Generative Transformer

Abstract: Time series models are ubiquitous in fields of science that deal with temporally structured
data. Recent advancements in time series analysis have seen a growing trend toward the
popularity of tailor-made transformer neural networks with customized attention blocks
and hand-crafted intricate design features. We show, perhaps surprisingly and against this
current trend, that a simple time series generative transformer model, dubbed tsGT, based
on a vanilla decoder-only architecture with the discretization of real values outperforms more
sophisticated contemporary models on selected prediction tasks. We evaluate tsGT against
eleven baselines and show that it surpasses its deterministic peers on MAE and RMSE,
and the stochastic ones on QL and CRPS on four commonly used datasets: electricity,
traffic, ETTm2, and weather. We use a well-known and theoretically justified rolling
window evaluation protocol and provide a detailed analysis of tsGT’s ability to model the
data distribution and predict marginal quantile values. We provide an implementation of
our method at https://github.com/ts-gt/tsgt.

URL: https://openreview.net/forum?id=jQwQwYvUb5

---

Title: Expressive Higher-Order Link Prediction through Hypergraph Symmetry Breaking

Abstract: A hypergraph consists of a set of nodes along with a collection of subsets of the nodes called hyperedges. Higher order link prediction is the task of predicting the existence of a missing hyperedge in a hypergraph. A hyperedge representation learned for higher order link prediction is fully expressive when it does not lose distinguishing power up to an isomorphism. Many existing hypergraph representation learners, are bounded in expressive power by the Generalized Weisfeiler Lehman-1 (GWL-1) algorithm, a generalization of the Weisfeiler Lehman-1 (WL-1) algorithm. The WL-1 algorithm can approximately decide whether two graphs are isomorphic. However, GWL-1 has limited expressive power. In fact, GWL-1 can only view the hypergraph as a collection of trees rooted at each of the nodes in the hypergraph. Furthermore, message passing on hypergraphs can already be computationally expensive, particularly with limited GPU device memory. To address these limitations, we devise a preprocessing algorithm that can identify certain regular subhypergraphs exhibiting symmetry with respect to GWL-1. Our preprocessing algorithm runs once with the time complexity linear in the size of the input hypergraph. During training, we randomly drop the hyperedges of the subhypergraphs identifed by the algorithm and add covering hyperedges to break symmetry. We show that our method improves the expressivity of GWL-1. Our extensive experiments 1 also demonstrate the effectiveness of our approach for higher-order link prediction on both graph and hypergraph datasets with negligible change in computation.

URL: https://openreview.net/forum?id=oG65SjZNIF

---

Title: Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning

Abstract: In decentralized multi-agent reinforcement learning, agents learning in isolation can lead to relative over-generalization (RO), where optimal joint actions are undervalued in favor of suboptimal ones. This hinders effective coordination in cooperative tasks, as agents tend to choose actions that are individually rational but collectively suboptimal. To address this issue, we introduce MaxMax Q-Learning (MMQ), which employs an iterative process of sampling and evaluating potential next states, selecting those with maximal Q-values for learning. This approach refines approximations of ideal state transitions, aligning more closely with the optimal joint policy of collaborating agents. We provide theoretical analysis supporting MMQ's potential and present empirical evaluations across various environments susceptible to RO. Our results demonstrate that MMQ frequently outperforms existing baselines, exhibiting enhanced convergence and sample efficiency.

URL: https://openreview.net/forum?id=oAkSRhl3qU

---

Title: Certified Robustness against Sparse Adversarial Perturbations via Data Localization

Abstract: Recent work in adversarial robustness suggests that natural data distributions are localized, \ie, they place high probability in small volume regions of the input space, and that this property can be utilized for designing classifiers with improved robustness guarantees for $\ell_2$-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this work, we extend this theory to $\ell_0$-bounded adversarial perturbations, where the attacker can modify a few pixels of the image but is unrestricted in the magnitude of perturbation, and we show necessary and sufficient conditions for the existence of $\ell_0$-robust classifiers. Theoretical certification approaches in this regime essentially employ voting over a large ensemble of classifiers. Such procedures are combinatorial and expensive or require complicated certification techniques. In contrast, a simple classifier emerges from our theory, dubbed Box-NN, which naturally incorporates the geometry of the problem and improves upon the current state-of-the-art in certified robustness against sparse attacks for the MNIST and Fashion-MNIST datasets.

URL: https://openreview.net/forum?id=17Ld3davzF

---

Title: Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)

Abstract: Can we obtain insights about the brain using AI models? How is the information in deep learning models related to brain recordings? Can we improve AI models with the help of brain recordings? Such questions can be tackled by studying brain recordings like functional magnetic resonance imaging (fMRI). As a first step, the neuroscience community has contributed several large cognitive neuroscience datasets related to passive reading/listening/viewing of concept words, narratives, pictures, and movies. Encoding and decoding models using these datasets have also been proposed in the past two decades. These models serve as additional tools for basic cognitive science and neuroscience research. Encoding models aim at generating fMRI brain representations given a stimulus automatically. They have several practical applications in evaluating and diagnosing neurological conditions and thus may also help design therapies for brain damage. Decoding models solve the inverse problem of reconstructing the stimuli given the fMRI. They are useful for designing brain-machine or brain-computer interfaces. Inspired by the effectiveness of deep learning models for natural language processing, computer vision, and speech, several neural encoding and decoding models have been recently proposed. In this survey, we will first discuss popular representations of language, vision and speech stimuli, and present a summary of neuroscience datasets. Further, we will review popular deep learning based encoding and decoding architectures and note their benefits and limitations. Finally, we will conclude with a summary and discussion about future trends. Given the large amount of recently published work in the computational cognitive neuroscience (CCN) community, we believe that this survey enables an entry point for DNN researchers to diversify into CCN research.

URL: https://openreview.net/forum?id=YxKJihRcby

---

Title: Generative Models: What Do They Know? Do They Know Things? Let's Find Out!

Abstract: Generative models excel at creating images that closely mimic real scenes, suggesting they inherently encode scene representations. We introduce Intrinsic LoRA (I-LoRA), a general approach that uses Low-Rank Adaptation (LoRA) to discover scene intrinsics such as normals, depth, albedo, and shading from a wide array of generative models. I-LoRA is lightweight, adding minimally to the model's parameters and requiring very small datasets for this knowledge discovery. Our approach, applicable to Diffusion models, GANs, and Autoregressive models alike, generates intrinsics using the same output head as the original images. We show a correlation between the generative model's quality and the extracted intrinsics' accuracy through control experiments. Finally, scene intrinsics obtained by our method with just hundreds to thousands of labeled images, perform on par with those from supervised methods trained on millions of labeled examples.

URL: https://openreview.net/forum?id=BgdW7ZTUD9

---

Title: CFASL: Composite Factor-Aligned Symmetry Learning for Disentanglement in Variational AutoEncoder

Abstract: Symmetries of input and latent vectors have provided valuable insights for disentanglement learning in VAEs. However, only a few works were proposed as an unsupervised method, and even these works require known factor information in training data. We propose a
novel method, Composite Factor-Aligned Symmetry Learning (CFASL), which is integrated into VAEs for learning symmetry-based disentanglement in unsupervised learning without any knowledge of the dataset factor information. CFASL incorporates three novel features for learning symmetry-based disentanglement: 1) Injecting inductive bias to align latent vector dimensions to factor-aligned symmetries within an explicit learnable symmetry code-book 2) Learning a composite symmetry to express unknown factors change between two random samples by learning factor-aligned symmetries within the codebook 3) Inducing group equivariant encoder and decoder in training VAEs with the two conditions. In addition, we propose an extended evaluation metric for multi-factor changes in comparison to disentanglement evaluation in VAEs. In quantitative and in-depth qualitative analysis, CFASL demonstrates a significant improvement of disentanglement in single-factor change, and multi-factor change conditions compared to state-of-the-art methods.

URL: https://openreview.net/forum?id=mDGvrH7lju

---

Title: Practical Synthesis of Mixed-Tailed Data with Normalizing Flows

Abstract: Capturing the correct tail behavior is difficult, yet essential for a faithful generative model. In this work, we provide an improved framework for training flows-based models with robust capabilities to capture the tail behavior of mixed-tail data. We propose a combination of a tail-flexible base distribution and a robust training algorithm to enable the flow to model heterogeneous tail behavior in the target distribution. We support our claim with extensive experiments on synthetic and real world data.

URL: https://openreview.net/forum?id=uphsKDj0Uu

---

Title: A Theoretical Insight of Histogram Binning and Extending to Multi-label Classification

Abstract: Learning well-calibrated probabilistic predictions is crucial as neural networks and machine learning models are increasingly employed in critical tasks nowadays. While there exist several post-processing methods aimed at calibrating output probabilities, most lack proper theoretical justification; in other words, they have typically only been validated on limited datasets and models to report empirical results. This work is divided into two parts. In the first part, we analyze some post-processing calibration methods from a geometrical perspective and demonstrate that calibrated outcomes consistently reduce Expected Calibration Error (ECE) while increasing accuracy. In the second part, we present a previously unexplored framework for calibrating the outcomes of multi-label problems by addressing multiple binary calibration problems. To achieve this, we introduce a novel concept of ECE for multi-label problems and provide substantial theoretical rationale for our approach. Experimental results demonstrate the feasibility and efficacy of our method in practice.

URL: https://openreview.net/forum?id=bAcQHIRSDf

---

Title: Offline Deep Reinforcement Learning for Visual Distractions via Domain Adversarial Training

Abstract: Recent advances in offline reinforcement learning (RL) have relied predominantly on learning from proprioceptive states. However, obtaining proprioceptive states for all objects may not always be feasible, particularly in offline settings. Therefore, RL agents must be capable of learning from raw sensor inputs such as images. However, recent studies have indicated that visual distractions can impair the performance of RL agents when observations in the evaluation environment differ significantly from those in the training environment. This issue is even more crucial in the visual offline RL paradigm, where the collected datasets can differ drastically from the testing environment. In this work, we investigated an adversarial-based algorithm to address the problem of visual distraction in offline RL settings. Our adversarial approach involves training agents to learn features that are more robust against visual distractions. Furthermore, we proposed a complementary dataset to add to the V-D4RL distraction dataset by extending it to more locomotion tasks. We empirically demonstrate that our method surpasses state-of-the-art baselines in tasks on both the VD4RL and proposed dataset when evaluated on random visual distractions.

URL: https://openreview.net/forum?id=dce6ZGkJ1Z

---

Title: Learning Dual Text Embeddings by Synthesising Images Conditioned on Text

Abstract: Text-to-Image (T2I) synthesis is a challenging task that requires modelling complex interactions between two modalities (i.e., text and image). A common framework adopted in recent state-of-the-art approaches to achieving such multi-modal interactions is to bootstrap the learning process with pre-trained image-aligned text embeddings. These text embeddings are typically learned by training an independent network with a contrastive loss between text and image features. Such a scheme comes with the downside that these embeddings are learned to capture distinctive features and trained only to differentiate between instances. These learned text embeddings are unaware of the complementary nature of generation to capture intricate, complex variations of image generation and discrimination process to capture distinctive features, which may hinder their usage in generative modelling.

To alleviate this downside, this paper explores a new direction to learn text embeddings in an end-to-end manner from text-to-image synthesis task that considers the complementary nature of generation and discrimination process. Specifically, a novel text-embedding learning scheme called "Dual Text Embedding" (DTE) is presented, in which one part of the embeddings is optimised to enhance the photo-realism of the generated images, and the other part seeks to capture text-to-image alignment. Through a comprehensive set of experiments on three text-to-image benchmark datasets (Oxford-102, Caltech-UCSD, and MS-COCO), models with dual text embeddings perform favourably in comparison with embeddings trained only to learn distinctive features.

URL: https://openreview.net/forum?id=XiOJtpqRUn

---

Title: Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Abstract: Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities.
Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large models to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design.
In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as an indispensable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed insights into recent advancements and practical applications.

URL: https://openreview.net/forum?id=lIsCS8b6zj

---

Title: A Generalization Bound for Nearly-Linear Networks

Abstract: We consider nonlinear networks as perturbations of linear ones. Based on this approach, we present novel generalization bounds that become non-vacuous for networks that are close to being linear. The main advantage over the previous works which propose non-vacuous generalization bounds is that our bounds are a-priori: performing the actual training is not required for evaluating the bounds. To the best of our knowledge, they are the first non-vacuous generalization bounds for neural nets possessing this property.

URL: https://openreview.net/forum?id=tRpWaK3pWh

---

Title: Graph as a Feature: Improving Node Classification with Non-Neural Graph-Aware Logistic Regression

Abstract: Graph Neural Networks (GNNs) and their message passing framework that leverages both structural and feature information, have become a standard method for solving graph-based machine learning problems. However, these approaches still struggle to generalise well beyond datasets that exhibit strong homophily, where nodes of the same class tend to connect. This limitation has led to the development of complex neural architectures that pose challenges in terms of efficiency and scalability. In response to these limitations, we focus on simpler and more scalable approaches and introduce Graph-aware Logistic Regression (GLR), a non-neural model designed for node classification tasks. Unlike traditional graph algorithms that use only a fraction of the information accessible to GNNs, our proposed model simultaneously leverages both node features and the relationships between entities. However instead of relying on message passing, our approach encodes each node's relationships as an additional feature vector, which is then combined with the node's self attributes. Extensive experimental results, conducted within a rigorous evaluation framework, show that our proposed GLR approach outperforms both foundational and sophisticated state-of-the-art GNN models in node classification tasks. Going beyond the traditional limited benchmarks, our experiments indicate that GLR increases generalisation ability while reaching performance gains in computation time up to two orders of magnitude compared to it best neural competitor.

URL: https://openreview.net/forum?id=U0WoKrZm4P

---

Title: Why do Fine-grained Labels in Pretraining Benefit Generalization?

Abstract: Recent literature shows that if a deep neural network is pretrained using fine-grained labeled data and then fine-tuned using coarse-labeled data for downstream tasks, its generalization performance is often better than pretraining using coarse-labeled data. While empirical evidence that support this finding is abundant, theoretical justification remains an open problem. This paper addresses the problem by introducing a "hierarchical multi-view" structure to confine the input data distribution. Under this data assumption, we prove that 1) coarse-grained pretraining only allows a neural network to learn the common features well, while 2) fine-grained pretraining helps the network learn the rare features in addition to the common ones, thus improving its accuracy on hard downstream test samples.

URL: https://openreview.net/forum?id=FojAV72owK

---

Title: Exploring the potential of Direct Feedback Alignment for Continual Learning

Abstract: Real-world applications of machine learning require robustness to shifts in the data distribution over time. A critical limitation of standard artificial neural networks trained with backpropagation (BP) is their susceptibility to catastrophic forgetting: they “forget” prior knowledge when trained on a new task, while biological neural networks tend to be more robust to catastrophic forgetting. While various algorithmic ways of mitigating
catastrophic forgetting have been proposed, developing an algorithm that is capable of learning continuously remains an open problem. Motivated by recent theoretical results, here we explore whether a biologically inspired learning algorithm like Direct feedback Alignment (DFA) can mitigate catastrophic forgetting in artificial neural networks. We train fully-connected networks on several continual learning benchmarks using DFA and compare its performance to vanilla back propagation, random features, and other continual learning algorithms. We find that an inherent bias of DFA, called “degeneracy breaking”, leads to low average forgetting on common continual learning benchmarks when using DFA in the Domain-Incremental learning scenario and in the Task-Incremental learning scenario. We show how to control the trade-off between learning and forgetting with DFA, and relate different modes of using DFA to other methods in the field.

URL: https://openreview.net/forum?id=MRZQrn7JEG

---

Title: Audio-Visual Dataset Distillation

Abstract: In this article, we introduce \textit{audio-visual dataset distillation}, a task to construct a smaller yet representative synthetic audio-visual dataset that maintains the cross-modal semantic association between audio and visual modalities. While dataset distillation techniques have primarily focused on image classification, the growing capabilities of audio-visual models and the vast datasets required for their training necessitate the exploration of distillation methods beyond the visual modality. Our approach builds upon the foundation of Distribution Matching (DM), extending it to handle the unique challenges of audio-visual data. A key challenge is to jointly learn synthetic data that distills both the modality-wise information and natural alignment from real audio-visual data. We introduce a vanilla audio-visual distribution matching framework that separately trains visual-only and audio-only DM components, enabling us to investigate the effectiveness of audio-visual integration and various multimodal fusion methods. To address the limitations of unimodal distillation, we propose two novel matching losses: joint matching loss and modality gap matching loss. These losses work in conjunction with the vanilla unimodal distribution matching loss to enforce cross-modal alignment and enhance the audio-visual dataset distillation process. Extensive audio-visual classification and retrieval experiments on four audio-visual datasets, AVE, MUSIC-21, VGGSound, and VGGSound-10K, demonstrate the effectiveness of our proposed matching approaches and validate the benefits of audio-visual integration with condensed data. This work establishes a new frontier in audio-visual dataset distillation, paving the way for further advancements in this exciting field. \textit{Our source code and pre-trained models will be released}.

URL: https://openreview.net/forum?id=IJlbuSrXmk

---

Title: Learning to Predict Usage Options of Product Reviews with LLM-Generated Labels

Abstract: Annotating large datasets can be challenging. However, crowd-sourcing is often expensive and can lack quality, especially for non-trivial tasks. We propose a method of using LLMs as few-shot learners for annotating data in a complex natural language task where we learn a standalone model to predict usage options for products from customer reviews. Learning a custom model offers individual control over energy efficiency and privacy measures compared to using the LLM directly for the sequence-to-sequence task. We compare this data annotation approach with other traditional methods and demonstrate how LLMs can enable considerable cost savings. We find that the quality of the resulting data exceeds the level attained by third-party vendor services and that GPT-4-generated labels even reach the level of domain experts.

URL: https://openreview.net/forum?id=bgpcQXotjA

---

Title: Symbolic Regression on Probabilistic Flows for Network System Modeling

Abstract: Real-world complex systems often miss high-fidelity physical descriptions and are typically subject to partial observability. Learning the dynamics of such systems is a challenging and ubiquitous problem, encountered in diverse critical applications which require interpretability and qualitative guarantees.Our paper addresses this problem in the case of sparsely observed probability distribution flows, governed by ODEs. Specifically, we devise a {\it white box} approach -dubbed Symbolic Distribution Flow Learner (\texttt{SDFL})- leveraging symbolic search with a Wasserstein-based loss function, resulting in a robust model-recovery scheme which naturally lends itself to cope with partial observability.
Additionally, we furnish the proposed framework with theoretical guarantees on the number of required {\it snapshots} to achieve a certain level of fidelity in the model-discovery.
We illustrate the performance of the proposed scheme on the prototypical problem of Kuramoto networks and a standard benchmark of single-cell RNA sequence trajectory data. The numerical experiments demonstrate the competitive performance of \texttt{SDFL} in comparison to the state-of-the-art.

URL: https://openreview.net/forum?id=ZfPbCFZQbx

---

Title: Spectral Self-supervised Feature Selection

Abstract: Choosing a meaningful subset of features from high-dimensional observations in unsupervised settings can greatly enhance the accuracy of downstream analysis, such as clustering or dimensionality reduction, and provide valuable insights into the sources of heterogeneity in a given dataset. In this paper, we propose a self-supervised graph-based approach for unsupervised feature selection. Our method's core involves computing robust pseudo-labels by applying simple processing steps to the graph Laplacian's eigenvectors. The subset of eigenvectors used for computing pseudo-labels is chosen based on a model stability criterion. We then measure the importance of each feature by training a surrogate model to predict the pseudo-labels from the observations. Our approach is shown to be robust to challenging scenarios, such as the presence of outliers and complex substructures. We demonstrate the effectiveness of our method through experiments on real-world datasets, showing its robustness across multiple domains, particularly its effectiveness on biological datasets.

URL: https://openreview.net/forum?id=t0EJiOd9Lg

---

Title: Multiple-Resolution Tokenization for Time Series Forecasting with an Application to Pricing

Abstract: We propose a transformer architecture for time series forecasting with a focus on time series tokenisation and apply it to a real-world prediction problem from the pricing domain. Our architecture aims to learn effective representations at many scales across all available data simultaneously. The model contains a number of novel modules: a differentiated form of time series patching which employs multiple resolutions, a multiple-resolution module for time-varying known variables, a mixer-based module for capturing cross-series information, and a novel output head with favourable scaling to account for the increased number of tokens. We present an application of this model to a real world prediction problem faced by the markdown team at a very large retailer. On the experiments conducted our model outperforms in-house models and the selected existing deep learning architectures.

URL: https://openreview.net/forum?id=dknvQtQNja

---

Title: Reward Collapse in Aligning Large Language Models

Abstract: The extraordinary capabilities of large language models (LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with reward models that are trained on human preferences represented as rankings of responses to prompts. In this paper, we document the phenomenon of \textit{reward collapse}, an empirical observation where the prevailing ranking-based approach results in an \textit{identical} reward distribution for diverse prompts during the terminal phase of training. This outcome is undesirable as open-ended prompts like ``write a short story about your best friend'' should yield a continuous range of rewards for their completions, while specific prompts like ``what is the capital city of New Zealand'' should generate either high or low rewards. Our theoretical investigation reveals that reward collapse is primarily due to the insufficiency of the ranking-based objective function to incorporate prompt-related information during optimization. This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic setting. To overcome reward collapse, we introduce a prompt-aware optimization scheme that provably admits a prompt-dependent reward distribution within the interpolating regime. Our experimental results suggest that our proposed prompt-aware utility functions significantly alleviate reward collapse during the training of reward models.

URL: https://openreview.net/forum?id=jKnrLNU4kh

---

Title: Distributed Multi-Agent Lifelong Learning

Abstract: Lifelong learning (LL) machines are designed to operate safely in dynamic environments by continually updating their knowledge. Conventional LL paradigms often assume that new data come labeled and that each LL machine has to learn independently from its environment. However, human labeling is expensive and impractical in remote conditions where automation is most desired. We introduce the Peer Parallel Lifelong Learning (PEEPLL) framework for distributed Multi-Agent Lifelong Learning, where agents continually learn online by actively requesting assistance from other agents instead of relying on the expensive environment to teach them. Unlike classical distributed AI, where communication scales poorly, lifelong learners need to communicate only on information they have not yet learned. Additionally, agents reply only if they are highly confident: Our TRUE confidence score uses a compute-efficient application of Variational Autoencoder to quantify confidence in prediction without needing data reconstruction. TRUE outperforms traditional Entropy-based confidence scores, reducing communication overhead by 18.05\% on CIFAR-100 and 5.8\% on MiniImageNet. To improve system resilience to low-quality or adversarial responses, our agents selectively accept a subset of received responses using the REFINE algorithm, which results in a 51.99\% increase in the percentage of correct accepted responses on CIFAR-100 and 25.79\% on MiniImageNet. Like traditional LL agents, PEEPLL agents store a subset of previously acquired knowledge as memory to learn alongside new information to prevent forgetting. We propose a Dynamic Memory-Update mechanism for PEEPLL agents that improves QA's classification performance by 44.17\% on CIFAR-100 and 26.8\% on MiniImageNet compared to the baseline Memory-Update mechanism. Our findings demonstrate that a PEEPLL agent can outperform an LL agent even if the latter has environmental supervision available, thus significantly reducing the need for labeling. PEEPLL provides a framework to facilitate research in distributed multi-agent LL, marking a substantial step towards practical, scalable lifelong learning technologies at the edge.

URL: https://openreview.net/forum?id=IIVr4Hu3Oi

---

Title: Learning Nonparametric Differential Equations via Multivariate Occupation Kernel Functions

Abstract: Learning a nonparametric system of ordinary differential equations from trajectories in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations often scale quadratically in $d$ unless additional knowledge about system properties, such as sparsity and symmetries, is available. In this work, we propose a linear approach, the occupation kernel method (OCK), using the implicit formulation provided by vector-valued Reproducing Kernel Hilbert Spaces. The solution for the vector field relies on multivariate occupation kernel functions associated with the trajectories and scales linearly with the dimension of the state space. We validate through experiments on a variety of simulated and real datasets ranging from 2 to 1024 dimensions. The OCK method outperforms all other comparators on 3 of the 9 datasets on full trajectory predictions, and 4 out of the 9 datasets on next point prediction.

URL: https://openreview.net/forum?id=zduPsND4Sy

---

Title: Generative Model for Change Point Detection in Dynamic Graphs

Abstract: This paper proposes a generative model to detect change points in time series of graphs. The proposed framework consists of learnable prior distributions for low-dimensional graph representations and of a decoder that can generate graphs from the latent representations. The informative prior distributions in the latent spaces are learned from the observed data as empirical Bayes, and the expressive power of generative model is exploited to assist multiple change point detection. Specifically, the model parameters are learned via maximum approximate likelihood, with a Group Fused Lasso regularization on the prior parameters. The optimization problem is then solved via Alternating Direction Method of Multipliers (ADMM), and Langevin Dynamics are recruited for posterior inference. Experiments in both simulated and real data demonstrate the ability of the generative model in supporting change point detection with good performance.

URL: https://openreview.net/forum?id=FVygyCbzon

---

Title: Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

Abstract: Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as the neural feature ansatz (NFA). Through the NFA, the authors introduce mapping with the AGOP as a general mechanism for neural feature learning. However, these works do not provide a theoretical explanation for this correlation or its origins. In this work, we further clarify the nature of this correlation, and explain its emergence. We show that this correlation is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent features at each layer. We further establish that the alignment is driven by the interaction of weight changes induced by SGD with the pre-activation features, and analyze the resulting dynamics analytically at early times in terms of simple statistics of the inputs and labels. Finally, motivated by the observation that the NFA is driven by this centered correlation, we introduce a simple optimization rule that dramatically increases the NFA correlations at any given layer and improves the quality of features learned.

URL: https://openreview.net/forum?id=JXCe2ZcUXr

---

Title: ExGRG: Explicitly-Generated Relation Graph for Self-Supervised Representation Learning

Abstract: Self-supervised Learning (SSL) has emerged as a powerful technique in pre-training deep learning models without relying on expensive annotated labels, instead leveraging embedded signals in unlabeled data. While SSL has shown remarkable success in computer vision tasks through intuitive data augmentation, its application to graph-structured data poses challenges due to the semantic-altering and counter-intuitive nature of graph augmentations. Addressing this limitation, this paper introduces a novel non-contrastive SSL approach to Explicitly Generate a compositional Relation Graph (ExGRG) instead of relying solely on the conventional augmentation-based implicit relation graph. ExGRG offers a framework for incorporating prior domain knowledge and online extracted information into the SSL invariance objective, drawing inspiration from the Laplacian Eigenmap and Expectation-Maximization (EM). Employing an EM perspective on SSL, our E-step involves relation graph generation to identify candidates to guide the SSL invariance objective, and M-step updates the model parameters by integrating the derived relational information. Extensive experimentation on diverse node classification datasets demonstrates the superiority of our method over state-of-the-art techniques, affirming ExGRG as an effective adoption of SSL for graph representation learning.

URL: https://openreview.net/forum?id=9eBNYnGTuU

---

Reply all
Reply to author
Forward
0 new messages