Daily TMLR digest for Nov 22, 2022

1 view

Skip to first unread message

TMLR

unread,

Nov 21, 2022, 7:00:07 PM11/21/22

to tmlr-anno...@googlegroups.com

New submissions
===============

Title: Responsible Active Learning via Human-in-the-loop Peer Study

Abstract: Active learning has been proposed to reduce data annotation efforts by only manually labelling representative data samples for training. Meanwhile, recent active learning applications have benefited a lot from cloud computing services with not only sufficient computational resources but also crowdsourcing frameworks that include many humans in the active learning loop. However, previous active learning methods that always require passing large-scale unlabelled data to cloud may potentially raise significant data privacy issues. To mitigate such a risk, we propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. Specifically, we first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side by maintaining an active learner (student) on the client-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion. To further enhance the active learner via large-scale unlabelled data, we introduce multiple peer students into the active learner which is trained by a novel learning paradigm, including the In-Class Peer Study on labelled data and the Out-of-Class Peer Study on unlabelled data. Lastly, we devise a discrepancy-based active sampling criterion, Peer Study Feedback, that exploits the variability of peer students to select the most informative data to improve model stability. Extensive experiments demonstrate the superiority of the proposed PSL over a wide range of active learning methods in both standard and sensitive protection settings.

URL: https://openreview.net/forum?id=vC7EyLp0c4

---

Title: Bagel: A Benchmark for Assessing Graph Neural Network Explanations

Abstract: Evaluating interpretability approaches for graph neural networks (GNN) specifically is known to be challenging due to the lack of a commonly accepted benchmark. Given a GNN model, several interpretability approaches exist to explain GNN models with diverse (sometimes conflicting) evaluation methodologies. In this paper, we propose a benchmark for evaluating the explainability approaches for GNNs called Bagel. In Bagel, we firstly propose four diverse GNN explanation evaluation regimes -- 1) faithfulness, 2) sparsity, 3) correctness. and 4) plausibility. We reconcile multiple evaluation metrics in the existing literature and cover diverse notions for a holistic evaluation. Our graph datasets range from citation networks, document graphs, to graphs from molecules and proteins. We conduct an extensive empirical study on four GNN models and nine post-hoc explanation approaches for node and graph classification tasks. We open both the benchmarks and reference implementations and make them available at https://anonymous.4open.science/r/Bagel-benchmark-F451/.

URL: https://openreview.net/forum?id=ZFPKeIuVvp

---

Title: Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Abstract: Functions of the ratio of the densities $p/q$ are widely used in machine learning to quantify the discrepancy between the two distributions $p$ and $q$. For high-dimensional distributions, binary classification-based density ratio estimators have shown great promise. However, when densities are well-separated, estimating the density ratio with a binary classifier is challenging. In this work, we show that the state-of-the-art density ratio estimators do perform poorly on well-separated cases and demonstrate that this is due to distribution shifts between training and evaluation time. We present an alternative method that leverages multi-class classification for density ratio estimation and does not suffer from distribution shift issues. The method uses a set of auxiliary densities $\{m_k\}_{k=1}^K$ and trains a multi-class logistic regression to classify the samples from $p, q$ and $\{m_k\}_{k=1}^K$ into $K+2$ classes. We prove that if these auxiliary densities are constructed such that they overlap with $p$ and $q$, then a multi-class logistic regression
allows for estimating $\log p/q$ on the domain of any of the $K+2$ distributions and resolves the distribution shift problems of the current state-of-the-art methods.
We compare our method to state-of-the-art density ratio estimators on both synthetic and real datasets and demonstrate its superior performance on the tasks of density ratio estimation, mutual information estimation, and representation learning.

URL: https://openreview.net/forum?id=jM8nzUzBWr

---

Title: Black-Box Prompt Learning for Pre-trained Language Models

Abstract: The increasing scale of general-purpose Pre-trained Language Models (\textbf{PLMs}) necessitates the study of more efficient adaptation across different downstream tasks. In this paper, we establish a Black-box Discrete Prompt Learning (\textbf{BDPL}) to resonate with pragmatic interactions between the cloud infrastructure and edge devices. Particularly, instead of fine-tuning the model in the cloud, we adapt PLMs by prompt learning, which efficiently optimizes only a few parameters of the discrete prompts. Moreover, we consider the scenario that we do not have access to the parameters and gradients of the pre-trained models, except for its outputs given inputs. This black-box setting secures the cloud infrastructure from potential attack and misuse to cause a single-point failure, which is preferable to the white-box counterpart by current infrastructures. Under this black-box constraint, we apply a variance-reduced policy gradient algorithm to estimate the gradients of parameters in the categorical distribution of each discrete prompt. In light of our method, the user devices can efficiently tune their tasks by querying the PLMs bounded by a range of API calls. Our experiments on RoBERTa and GPT-3 demonstrate that the proposed algorithm achieves significant improvement on eight benchmarks in a cloud-device collaboration manner. Finally, we conduct in-depth case studies to comprehensively analyze our method in terms of various data sizes, prompt lengths, training budgets, optimization objectives, prompt transferability, and explanations of the learned prompts.

URL: https://openreview.net/forum?id=IvsGP7xRvm

---

Title: Personalized Federated Learning: A Unified Framework and Universal Optimization Techniques

Abstract: We study the optimization aspects of personalized Federated Learning (FL). We propose general optimizers that can be used to solve many existing personalized FL objectives, namely a tailored variant of Local SGD and variants of accelerated coordinate descent/accelerated SVRCD. By studying a general personalized objective that is capable of recovering many existing personalized FL objectives as special cases, we develop a universal optimization theory applicable to many strongly convex personalized FL models in the literature. We demonstrate the practicality and/or optimality of our methods both in terms of communication and local computation. Surprisingly enough, our general optimization solvers and theory are capable of recovering best-known communication and computation guarantees for solving specific personalized FL objectives. Thus, our proposed methods can be taken as universal optimizers that make the design of task-specific optimizers unnecessary in many cases.

URL: https://openreview.net/forum?id=ilHM31lXC4

---

Title: Generalization bounds for Kernel Canonical Correlation Analysis

Abstract: We study the problem of multiview representation learning using kernel canonical correlation analysis (KCCA) and establish non-asymptotic bounds on generalization error for regularized empirical risk minimization. In particular, we give fine-grained high-probability bounds on generalization error ranging from $O(n^{-1/6})$ to $O(n^{-1/5})$ depending on underlying distributional properties, where $n$ is the number of data samples. For the special case of finite-dimensional Hilbert spaces (such as linear CCA), our rates improve, ranging from $O(n^{-1/2})$ to $O(n^{-1})$. Finally, our results generalize to the problem of functional canonical correlation analysis over abstract Hilbert spaces.

URL: https://openreview.net/forum?id=KwWKB9Bqam

---

Reply all

Reply to author

Forward

0 new messages