Daily TMLR digest for Jun 29, 2024

0 views

Skip to first unread message

TMLR

unread,

Jun 29, 2024, 12:00:24 AM (yesterday) Jun 29

to tmlr-anno...@googlegroups.com

New certifications
==================

Featured Certification: Fine-tuning can cripple your foundation model; preserving features may be the solution

Jishnu Mukhoti, Yarin Gal, Philip Torr, Puneet K. Dokania

https://openreview.net/forum?id=kfhoeZCeW7

---

Accepted papers
===============

Title: Fine-tuning can cripple your foundation model; preserving features may be the solution

Authors: Jishnu Mukhoti, Yarin Gal, Philip Torr, Puneet K. Dokania

Abstract: Pre-trained foundation models, due to their enormous capacity and exposure to vast amounts of data during pre-training, are known to have learned plenty of real-world concepts. An important step in making these pre-trained models effective on downstream tasks is to fine-tune them on related datasets. While various fine-tuning methods have been devised and have been shown to be highly effective, we observe that a fine-tuned model's ability to recognize concepts on tasks different from the downstream one is reduced significantly compared to its pre-trained counterpart. This is an undesirable effect of fine-tuning as a substantial amount of resources was used to learn these pre-trained concepts in the first place. We call this phenomenon "concept forgetting'' and via experiments show that most end-to-end fine-tuning approaches suffer heavily from this side effect. To this end, we propose a simple fix to this problem by designing a new fine-tuning method called LDIFS (short for $\ell_2$ distance in feature space) that, while learning new concepts related to the downstream task, allows a model to preserve its pre-trained knowledge as well. Through extensive experiments on 10 fine-tuning tasks we show that LDIFS significantly reduces concept forgetting. Additionally, we show that LDIFS is highly effective in performing continual fine-tuning on a sequence of tasks as well, in comparison with both fine-tuning as well as continual learning baselines.

URL: https://openreview.net/forum?id=kfhoeZCeW7

---

Title: Unmasking the Veil: An Investigation into Concept Ablation for Privacy and Copyright Protection in Images

Authors: Shivank Garg, Manyana Tiwari

Abstract: In this paper, we extend the study of concept ablation within pre-trained models as introduced in 'Ablating Concepts in Text-to-Image Diffusion Models' by $\citep{Kumari2022}$. Our work focuses on reproducing the results achieved by the different variants of concept ablation proposed through predefined metrics. We also introduce a novel variant of concept ablation—trademark ablation. This variant combines the principles of memorization and instance ablation to tackle the nuanced influence of proprietary or branded elements in model outputs. Further, our research contributions include an observational analysis of the model's limitations. Moreover, we investigate the model's behavior in response to ablation leakage-inducing prompts, which aim to indirectly ablate concepts, revealing insights into the model's resilience and adaptability. We also observe the model's performance degradation on images generated by concepts far from its target ablation concept, which is documented in the appendix.

URL: https://openreview.net/forum?id=TYYApLzjaQ

---

Title: Improving Variational Autoencoder Estimation from Incomplete Data with Mixture Variational Families

Authors: Vaidotas Simkus, Michael U. Gutmann

Abstract: We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model’s posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data.

URL: https://openreview.net/forum?id=lLVmIvZfry

---

Title: Conciliator steering: Imposing user preference in multi-objective reinforcement learning

Authors: Sara Pyykölä, Klavdiya Olegovna Bochenina, Laura Ruotsalainen

Abstract: Many real-world problems with multiple objectives require reinforcement learning solutions that can handle trade-offs in a user-preferred manner. In the multi-objective framework, a single algorithm adapting to different user preferences based on a pre-defined reward function and a subjectively defined scalarisation function may be developed. The scalarisation function approximation can be done by fitting a meta-model with information gained from the interaction between the user and the environment or the agent. The interaction requires exact formulation of a constructive feedback, which is also simple for the user to give. In this paper, we propose a novel algorithm, Conciliator steering, that leverages priority order and reward transfer to seek optimal user-preferred policies in multi-objective reinforcement learning under expected scalarised returns criterion. We test Conciliator steering on DeepSeaTreasure v1 benchmark problem and demonstrate that it can find user-preferred policies with effortless and simple user-agent interaction and negligible bias, which has not been possible before. Additionally, we show that on average Conciliator steering results in a fraction of carbon dioxide emissions and total energy consumption when compared to a training of fully connected MNIST classifier, both run on a personal laptop.

URL: https://openreview.net/forum?id=XAD2kcBS50

---

Title: Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?

Authors: Jin Huang, Xingjian Zhang, Qiaozhu Mei, Jiaqi Ma

Abstract: Large language models (LLMs) are gaining increasing attention for their capability to process graphs with rich text attributes, especially in a zero-shot fashion. Recent studies demonstrate that LLMs obtain decent text classification performance on common text-rich graph benchmarks, and the performance can be improved by appending encoded structural information as natural languages into prompts. We aim to understand why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs. First, we rule out the concern of data leakage by curating a novel leakage-free dataset and conducting a comparative analysis alongside a previously widely-used dataset. Second, as past work usually encodes the ego-graph by describing the graph structure in natural language, we ask the question: do LLMs understand the prompts in graph structures? Third, we investigate why LLMs can improve their performance after incorporating structural information.
Our exploration of these questions reveals that (i) there is no substantial evidence that the performance of LLMs is significantly attributed to data leakage; (ii) instead of understanding prompts as graph structures, LLMs tend to process prompts more as contextual paragraphs and (iii) the most efficient elements of the local neighborhood included in the prompt are phrases that are pertinent to the node label, rather than the graph structure.

URL: https://openreview.net/forum?id=L2jRavXRxs

---

New submissions
===============

Title: Improving Text-to-Image Consistency via Automatic Prompt Optimization

Abstract: Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions to improve prompt-image consistency suffer from the following challenges: (1) they oftentimes require model fine-tuning, (2) they only focus on nearby prompt samples, and (3) they are affected by unfavorable trade-offs among image quality, representation diversity, and prompt-image consistency. In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models. Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score. Our extensive validation on two datasets, MSCOCO and PartiPrompts, shows that OPT2I can boost the initial consistency score by up to 24.9% in terms of DSG score while preserving the FID and increasing the recall between generated and real data. Our work paves the way toward building more reliable and robust T2I systems by harnessing the power of LLMs.

URL: https://openreview.net/forum?id=g12Gdl6aDL

---

Title: Towards Backwards-Compatible Data with Confounded Domain Adaptation

Abstract: Most current domain adaptation methods address either covariate shift or label shift, but are not applicable where they occur simultaneously and are confounded with each other. Domain adaptation approaches which do account for such confounding are designed to adapt covariates to optimally predict a particular label whose shift is confounded with covariate shift. In this paper, we instead seek to achieve general-purpose data backwards compatibility. This would allow the adapted covariates to be used for a variety of downstream problems, including on pre-existing prediction models and on data analytics tasks. To do this we consider a modification of generalized label shift (GLS), which we call confounded shift. We present a novel framework for this problem, based on minimizing the expected divergence between the source and target conditional distributions, conditioning on possible confounders. Within this framework, we provide concrete implementations using the Gaussian reverse Kullback-Leibler divergence and the maximum mean discrepancy. Finally, we demonstrate our approach on synthetic and real datasets.

URL: https://openreview.net/forum?id=GSp2WC7q0r

---

Title: For Robust Worst-Group Accuracy, Ignore Group Annotations

Abstract: Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise. The WGA gap is exacerbated in high-noise regimes for models trained with vanilla empirical risk minimization. To this end, we introduce Regularized Annotation of Domains (RAD) to train robust last layer classifiers without needing explicit domain annotations. Our results show that RAD is competitive with other recently proposed domain annotation-free techniques. Most importantly, RAD outperforms state-of-the-art annotation-reliant methods even with only 5\% noise in the training data for several publicly available datasets.

URL: https://openreview.net/forum?id=l8E68fD6yp

---

Title: Dual Gauss-Newton Directions for Deep Learning

Abstract: Gauss-Newton (a.k.a. prox-linear) directions can be computed by solving an
optimization subproblem that trade-offs between a partial linearization of the
objective function and a proximity term. In this paper, we study the possibility
to leverage the convexity of this subproblem in order to instead solve the
corresponding dual. As we show, the dual can be advantageous when the number of
network outputs is smaller than the number of network parameters. We propose a
conjugate gradient algorithm to solve the dual, that integrates seamlessly with
autodiff through the use of linear operators and handles dual constraints. We
prove that this algorithm produces descent directions, when run for any number
of steps. Finally, we study empirically the advantages and current limitations
of our approach compared to various popular deep learning solvers.

URL: https://openreview.net/forum?id=Lce9dGQ1L9

---

Title: Adapt then Unlearn: Exploring Parameter Space Semantics for Unlearning in Generative Adversarial Networks

Abstract: Owing to the growing concerns about privacy and regulatory compliance, it is desirable to regulate the output of generative models. To that end, the objective of this work is to prevent the generation of outputs containing undesired features from a pre-trained Generative Adversarial Network (GAN) where the underlying training data set is inaccessible. Our approach is inspired by the observation that the parameter space of GANs exhibits meaningful directions that can be leveraged to suppress specific undesired features. However, such directions usually result in the degradation of the quality of generated samples. Our proposed two-stage method, known as `\textbf{Adapt-then-Unlearn},' excels at unlearning such undesirable features while also maintaining the quality of generated samples. In the initial stage, we adapt a pre-trained GAN on a set of negative samples (containing undesired features) provided by the user. Subsequently, we train the original pre-trained GAN using positive samples, along with a repulsion regularizer. This regularizer encourages the learned model parameters to move away from the parameters of the adapted model (first stage) while not degrading the generation quality. We provide theoretical insights into the proposed method. To the best of our knowledge, our approach stands as the first method addressing unlearning within the realm of high-fidelity GANs (such as StyleGAN). We validate the effectiveness of our method through comprehensive experiments, encompassing both class-level unlearning on the MNIST and AFHQ dataset and feature-level unlearning tasks on the CelebA-HQ dataset. Our code and implementation is available at: https://anonymous.4open.science/r/Unlearning_GAN_Via_Few_Shot_Adaptation/.

URL: https://openreview.net/forum?id=jAHEBivObO

---

Title: On the effects of similarity metrics in decentralized deep learning under distribution shift

Abstract: Decentralized Learning (DL) enables privacy-preserving collaboration among organizations or users to enhance the performance of local deep learning models. However, model aggregation becomes challenging when client data is heterogeneous, and identifying compatible collaborators without direct data exchange remains a pressing issue. In this paper, we investigate the effectiveness of various similarity metrics in DL for identifying peers for model merging, conducting an empirical analysis across multiple datasets with distribution shifts. Our research provides insights into the performance of these metrics, examining their role in facilitating effective collaboration. By exploring the strengths and limitations of these metrics, we contribute to the development of robust DL methods.

URL: https://openreview.net/forum?id=WppTEs4Kkn

---

Reply all

Reply to author

Forward

0 new messages