Daily TMLR digest for Jun 18, 2024

0 views
Skip to first unread message

TMLR

unread,
Jun 18, 2024, 12:00:09 AM (13 days ago) Jun 18
to tmlr-anno...@googlegroups.com


New certifications
==================



Reproducibility Certification: Reproducibility study of "Robust Fair Clustering: A Novel Fairness Attack and Defense Framework"

Lucas Ponticelli, Vincent Loos, Eren Kocadag, Kacper Bartosik

https://openreview.net/forum?id=Xu1sEPhjqH

---


Accepted papers
===============


Title: Reproducibility study of "Robust Fair Clustering: A Novel Fairness Attack and Defense Framework"

Authors: Lucas Ponticelli, Vincent Loos, Eren Kocadag, Kacper Bartosik

Abstract: This reproducibility study examines "Robust Fair Clustering: A Novel Fairness Attack and Defense Framework" by Chhabra et al. (2023), an innovative work in fair clustering algorithms. Our study focuses on validating the original paper's claims concerning the susceptibility of state-of-the-art fair clustering models to adversarial attacks and the efficacy of the proposed Consensus Fair Clustering (CFC) defence mechanism. We employ a similar experimental framework but extend our investigations by using additional datasets. Our findings confirm the original paper's claims, reinforcing the vulnerability of fair clustering models to adversarial attacks and the robustness of the CFC mechanism.

URL: https://openreview.net/forum?id=Xu1sEPhjqH

---

Title: TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Modality

Authors: Yinsong Wang, Shahin Shahrampour

Abstract: This paper addresses a cross-modal learning framework, where the objective is to enhance the performance of supervised learning in the primary modality using an unlabeled, unpaired secondary modality. Taking a probabilistic approach for missing information estimation, we show that the extra information contained in the secondary modality can be estimated via Nadaraya-Watson (NW) kernel regression, which can further be expressed as a kernelized cross-attention module (under linear transformation). This expression lays the foundation for introducing The Attention Patch (TAP), a simple neural network add-on that can be trained to allow data-level knowledge transfer from the unlabeled modality. We provide extensive numerical simulations using real-world datasets to show that TAP can provide statistically significant improvement in generalization across different domains and different neural network architectures, making use of seemingly unusable unlabeled cross-modal data.

URL: https://openreview.net/forum?id=73uyerai53

---

Title: Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Authors: Linjie Xu, zhengyao jiang, Jinyu Wang, Lei Song, Jiang Bian

Abstract: Offline reinforcement learning (RL) methodologies enforce constraints on the policy to adhere closely to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions during test time. Conventional approaches apply identical constraints for both value learning and test time inference. However, our findings indicate that the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a Mildly Constrained Evaluation Policy (MCEP) for test time inference with a more constrained target policy for value estimation. Since the target policy has been adopted in various prior approaches, MCEP can be seamlessly integrated with them as a plug-in. We instantiate MCEP based on TD3BC (Fujimoto & Gu, 2021), AWAC (Nair et al., 2020) and DQL (Wang et al., 2023) algorithms. The empirical results on D4RL MuJoCo locomotion, high-dimensional humanoid and a set of 16 robotic manipulation tasks show that the MCEP brought significant performance improvement on classic offline RL methods and can further improve SOTA methods. The codes are open-sourced at \url{https://github.com/egg-west/MCEP}.

URL: https://openreview.net/forum?id=imAROs79Pb

---

Title: Deep End-to-end Causal Inference

Authors: Tomas Geffner, Javier Antoran, Adam Foster, Wenbo Gong, Chao Ma, Emre Kiciman, Amit Sharma, Angus Lamb, Martin Kukla, Nick Pawlowski, Agrin Hilmkil, Joel Jennings, Meyer Scetbon, Miltiadis Allamanis, Cheng Zhang

Abstract: Causal inference is essential for data-driven decision-making across domains such as business engagement, medical treatment, and policy making. However, in practice, causal inference suffers from many limitations including unknown causal graphs, missing data problems, and mixed data types. To tackle those challenges, we develop Deep End-to-end Causal Inference (DECI) framework, a flow based non-linear additive noise model combined with variational inference, which can perform both Bayesian causal discovery and inference. Theoretically, we show that DECI unifies many existing structural equation model (SEM) based causal inference techniques and can recover the ground truth mechanism under standard assumptions. Motivated by the challenges in the real world, we further extend DECI to heterogeneous, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. Empirically, we conduct extensive experiments (over a thousand) to show the competitive performance of DECI when compared to relevant baselines for both causal discovery and inference with both synthetic and causal machine learning benchmarks across data types and levels of missingness.

URL: https://openreview.net/forum?id=e6sqttxEGX

---


New submissions
===============


Title: Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks

Abstract: Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.

URL: https://openreview.net/forum?id=Uc2mqNPkEq

---

Title: Improved Variational Bayesian Phylogenetic Inference using Mixtures

Abstract: We introduce VBPI-Mixtures, an innovative algorithm aimed at improving the precision of phylogenetic posterior distributions, with a focus on accurately approximating tree-topologies and branch lengths. Although Variational Bayesian Phylogenetic Inference (VBPI) a state-of-the-art black-box variational inference (BBVI) framework has achieved significant success in approximating these distributions, it faces challenges in dealing with the multimodal nature of tree-topology posteriors. While advanced deep learning techniques like normalizing flows and graph neural networks have enhanced VBPI's approximations of branch-length posteriors, there has been a gap in improving its tree-topology posterior approximations. Our novel VBPI-Mixtures algorithm addresses this gap by leveraging recent advancements in mixture learning within the BBVI domain. Consequently, VBPI-Mixtures can capture distributions over tree-topologies that other VBPI algorithms cannot model. We demonstrate superior performance on challenging density estimation tasks across various real phylogenetic datasets.

URL: https://openreview.net/forum?id=yuhG3VHK5f

---

Title: IM-Context: In-Context Learning for Imbalanced Regression Tasks

Abstract: Regression models often fail to generalize effectively in regions characterized by highly imbalanced label distributions. Previous methods for deep imbalanced regression rely on gradient-based weight updates, which tend to overfit in underrepresented regions. This paper proposes a paradigm shift towards in-context learning as an effective alternative to conventional in-weight learning methods, particularly for addressing imbalanced regression. In-context learning refers to the ability of a model to condition itself, given a prompt sequence composed of in-context samples (input-label pairs) alongside a new query input to generate predictions, without requiring any parameter updates. In this paper, we study the impact of the prompt sequence on the model performance from both theoretical and empirical perspectives. We emphasize the importance of localized context in reducing bias within regions of high imbalance. Empirical evaluations across a variety of real-world datasets demonstrate that in-context learning substantially outperforms existing in-weight learning methods in scenarios with high levels of imbalance.

URL: https://openreview.net/forum?id=p4Y844vJWG

---

Title: Efficient Deep Learning with Decorrelated Backpropagation

Abstract: The backpropagation algorithm remains the dominant and most successful method for
training deep neural networks (DNNs). At the same time, training DNNs at scale comes at
a significant computational cost and therefore a high carbon footprint. Converging evidence
suggests that input decorrelation may speed up deep learning. However, to date, this has not
yet translated into substantial improvements in training efficiency in large-scale DNNs. This
is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation.
Here, we show for the first time that much more efficient training of very deep neural networks
using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel
algorithm which induces network-wide input decorrelation using minimal computational
overhead. By combining this algorithm with careful optimizations, we achieve a more than
two-fold speed-up and higher test accuracy compared to backpropagation when training
a 18-layer deep residual network. This demonstrates that decorrelation provides exciting
prospects for efficient deep learning at scale.

URL: https://openreview.net/forum?id=HUJ4rZLcju

---

Title: Single-Shot Plug-and-Play Methods for Inverse Problems

Abstract: The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on pre-trained denoisers using large datasets. In this work, we introduce Single-Shot PnP methods (SS-PnP), shifting the focus to solving inverse problems with minimal data. First, we integrate Single-Shot proximal denoisers into iterative methods, enabling training with single instances. Second, we propose implicit neural priors based on a novel function that preserves relevant frequencies to capture fine details while avoiding the issue of vanishing gradients. We demonstrate, through extensive numerical and visual experiments, that our method leads to better approximations.

URL: https://openreview.net/forum?id=vXevE43NxF

---

Title: Adjacency Search Embeddings

Abstract: In this study, we propose two novel Adjacency Search Embeddings that are inspired by the theory of identifying s-t minimum cuts: Maximum Adjacency Search (MAS) and Threshold-based Adjacency Search (TAS), which leverage both the node and a subset of its neighborhood to discern a set of nodes well-integrated into higher-order network structures. This serves as context for generating higher-order representations. Our approaches, when used in conjunction with the skip-gram model, exhibit superior effectiveness in comparison to other shallow embedding techniques in tasks such as link prediction and node classification. By incorporating our mechanisms as a preprocessing technique, we show substantial improvements in node classification performance across GNNs like GCN, GraphSage, and Gatv2 on both attributed and non-attributed networks. Furthermore, we substantiate the applicability of our approaches, shedding light on their aptness for specific graph scenarios. Our source code can be accessed through "https://anonymous.4open.science/r/adjacency-embeddings-DC6B".

URL: https://openreview.net/forum?id=GDN5cFTNaL

---

Title: No Need for Ad-hoc Substitutes: The Expected Cost is a Principled All-purpose Classification Metric

Abstract: The expected cost (EC) is one of the main classification metrics introduced in statistical and machine learning books. It is based on the assumption that, for a given application of interest, each decision made by the system has a corresponding cost which depends on the true class of the sample. An evaluation metric can then be defined by taking the expectation of the cost over the data. Two special cases of the EC are widely used in the machine learning literature: the error rate (one minus the accuracy) and the balanced error rate (one minus the balanced accuracy or unweighted average recall). Other instances of the EC can be useful for applications in which some types of errors are more severe than others, or when the prior probabilities of the classes differ between the evaluation data and the use-case scenario. Surprisingly, the general form for the EC is rarely used in the machine learning literature. Instead, alternative ad-hoc metrics like the F-beta score and the Matthews correlation coefficient (MCC) are used for many applications. In this work, we argue that the EC is superior to these alternative metrics, being more general, interpretable, and adaptable to any application scenario. We provide both theoretically-motivated discussions as well as examples to illustrate the behavior of the different metrics.

URL: https://openreview.net/forum?id=5PPbvCExZs

---

Title: Attacking the Spike: On the Security of Spiking Neural Networks to Adversarial Examples

Abstract: Spiking neural networks (SNNs) have attracted much attention for their high energy efficiency and for recent advances in their classification performance. However, unlike traditional deep learning approaches, the analysis and study of the robustness of SNNs to adversarial examples remain relatively underdeveloped. In this work, we focus on advancing the adversarial attack side of SNNs and make three major contributions. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient estimation technique, even in the case of adversarially trained SNNs. Second, using the best single surrogate gradient estimation technique, we analyze the transferability of adversarial attacks on SNNs and other state-of-the-art architectures like Vision Transformers (ViTs), as well as CNNs. Our analyzes reveal two key areas where SNN adversarial attacks can be enhanced: no white-box attack effectively exploits the use of multiple surrogate gradient estimators for SNNs, and no single model attack is effective at generating adversarial examples misclassified by both SNNs and non-SNN models simultaneously.

For our third contribution, we develop a new attack, the Mixed Dynamic Spiking Estimation (MDSE) attack to address these issues. MDSE utilizes a dynamic gradient estimation scheme to fully exploit multiple surrogate gradient estimator functions. In addition, our novel attack generates adversarial examples capable of fooling both SNN and non-SNN models simultaneously. The MDSE attack is as much as $91.4\%$ more effective on SNN/ViT model ensembles and provides a $3\times$ boost in attack effectiveness on adversarially trained SNN ensembles, compared to conventional white-box attacks like Auto-PGD. Our experiments are broad and rigorous, covering three datasets (CIFAR-10, CIFAR-100 and ImageNet) and nineteen classifier models (seven for each CIFAR dataset and five models for ImageNet). We will release a full publicly available code repository for the models and attacks upon publication.

URL: https://openreview.net/forum?id=2TBTuRZxhm

---
Reply all
Reply to author
Forward
0 new messages