Daily TMLR digest for Jun 28, 2024

0 views

Skip to first unread message

TMLR

unread,

Jun 28, 2024, 12:00:40 AM (3 days ago) Jun 28

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Solving Robust MDPs through No-Regret Dynamics

Authors: Etash Kumar Guha

Abstract: Reinforcement learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics. Generating an algorithm that can find environmentally robust policies efficiently and handle different model parameterizations without imposing stringent assumptions on the uncertainty set of transitions is difficult due to the intricate interactions between policy and environment. In this paper, we address both of these issues with a No-Regret Dynamics framework that utilizes policy gradient methods and iteratively approximates the worst case environment during training, avoiding assumptions on the uncertainty set. Alongside a toolbox of nonconvex online learning algorithms, we demonstrate that our framework can achieve fast convergence rates for many different problem settings and relax assumptions on the uncertainty set of transitions.

URL: https://openreview.net/forum?id=SdCuffxg5A

---

Title: Fair Feature Importance Scores for Interpreting Decision Trees

Authors: Camille Olivia Little, Debolina Halder Lina, Genevera I. Allen

Abstract: Across various sectors such as healthcare, criminal justice, national security, finance, and technology, large-scale machine learning (ML) systems are being deployed to make critical data-driven decisions. Many have asked if we can and should trust these ML systems to be making these decisions. Two critical components are prerequisites for trust in ML systems: interpretability, or the ability to understand why the ML system makes the decisions it does, and fairness, which ensures that ML systems do not exhibit bias against certain individuals or groups. While both interpretability and fairness have garnered substantial attention in the ML literature, methods directly interpreting models in terms of fairness remain limited. This paper considers a popular interpretation for a widely used class of ML models: feature importance scores for decision trees and tree-based models. We introduce a novel Fair Tree Feature Importance Score to assess each feature's impact on fairness or bias in decision trees. Analogous to the mean decrease in impurity for trees, our score quantifies the mean increase (or decrease) in group bias, and extends to interpret tree-based ensembles or surrogates of complex ML systems. Through simulations and real examples on benchmark fairness datasets, we show the validity of our Fair Tree Feature Importance Score, offering meaningful interpretations for both tree-based ensembles and tree-based surrogates of other ML systems.

URL: https://openreview.net/forum?id=72mDxlzRZ1

---

Title: Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization

Authors: Mahdi Biparva, Raika Karimi, Faezeh Faez, Yingxue Zhang

Abstract: Temporal Graph Neural Networks have garnered substantial attention for their capacity to model evolving structural and temporal patterns while exhibiting impressive performance. However, it is known that these architectures are encumbered by issues that constrain their performance, such as over-squashing and over-smoothing. Meanwhile, Transformers have demonstrated exceptional computational capacity to effectively address challenges related to long-range dependencies. Consequently, we introduce Todyformer—a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers through i) a novel patchifying paradigm for dynamic graphs to improve over-squashing, ii) a structure-aware parametric tokenization strategy leveraging MPNNs, iii) a Transformer with temporal positional-encoding to capture long-range dependencies, and iv) an encoding architecture that alternates between local and global contextualization, mitigating over-smoothing in MPNNs. Experimental evaluations on public benchmark datasets demonstrate that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks. Furthermore, we illustrate the underlying aspects of the proposed model in effectively capturing extensive temporal dependencies in dynamic graphs.

URL: https://openreview.net/forum?id=nAQSUqEspb

---

Title: The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective

Authors: Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari, Himabindu Lakkaraju

Abstract: As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

URL: https://openreview.net/forum?id=jESY2WTZCe

---

New submissions
===============

Title: A Theoretical Framework for Zeroth-Order Budget Convex Optimization

Abstract: This paper studies a natural generalization of the problem of minimizing a convex function $f$ by querying its values sequentially.
At each time-step $t$, the optimizer can invest a budget $b_t$ in a query point $X_t$ of their choice to obtain a fuzzy evaluation of $f$ at $X_t$ whose accuracy depends on the amount of budget invested in $X_t$ across times. This setting is motivated by the minimization of objectives whose values can only be determined approximately through lengthy or expensive computations, where it is paramount to recycle past information. In the univariate case, we design ReSearch, an anytime parameter-free algorithm for which we prove near-optimal optimization-error guarantees. Then, we present two applications of our univariate analysis. First, we show how to use ReSearch for stochastic convex optimization, obtaining theoretical and empirical improvements on state-of-the-art benchmarks. Second, we handle the $d$-dimensional budget problem by combining ReSearch with a coordinate descent method, presenting theoretical guarantees and experiments.

URL: https://openreview.net/forum?id=bo8vM9j3UO

---

Title: From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Abstract: One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.

URL: https://openreview.net/forum?id=eskQMcIbMS

---

Title: Linear Weight Interpolation Leads to Transient Performance Gains

Abstract: We train copies of a neural network on different sets of SGD noise and find that linearly interpolating their weights can, remarkably, produce networks that perform significantly better than the original networks. However, such interpolated networks consistently end up in unfavorable regions of the optimization landscape: with further training, their performance fails to improve or degrades, effectively undoing the performance gained from the interpolation. We identify two quantities that impact an interpolated network's performance and relate our observations to linear mode connectivity. Finally, we investigate this phenomenon from the lens of example importance and find that performance improves and degrades almost exclusively on the harder subsets of the training data, while performance is stable on the easier subsets. Our work represents a step towards a better understanding of neural network loss landscapes and weight interpolation in deep learning.

URL: https://openreview.net/forum?id=XGAdBXlFcj

---

Title: PriViT: Vision Transformers for Private Inference

Abstract: The Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications. However, ViTs are ill-suited for private inference using secure multi-party computation (MPC) protocols, due to the large number of non-polynomial operations (self-attention, feed-forward rectifiers, layer normalization). We develop PriViT, a gradient-based algorithm to selectively Taylorize nonlinearities in ViTs while maintaining their prediction accuracy. Our algorithm is conceptually very simple, easy to implement, and achieves improved performance over existing MPC-friendly transformer architectures in terms of the latency-accuracy Pareto frontier.

URL: https://openreview.net/forum?id=3CmPvcYJnm

---

Title: Continual Adaptation of Foundation Models for Federated Learning

Abstract: In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges from both the Continual and Federated Learning perspectives. Specifically, models trained in a CFL setup suffer from catastrophic forgetting which is exacerbated by data heterogeneity across clients. Existing attempts at this problem tend to impose large overheads on clients and communication channels or require access to stored data which renders them unsuitable for real-world use due to privacy. In this paper, we attempt to tackle forgetting and heterogeneity while minimizing overhead costs and without requiring access to any stored data. We study this problem in the context of Foundation Models and explore parameter-efficient approaches to adapt to dynamic distributions while minimizing forgetting. We achieve this by leveraging a prompting based approach (such that only prompts and classifier heads have to be communicated) and proposing a novel and lightweight generation and distillation scheme to consolidate client models at the server. We formulate this problem for image classification and establish strong baselines for comparison, conduct experiments on CIFAR-100 as well as challenging, large-scale datasets like ImageNet-R and DomainNet. Our approach outperforms both existing methods and our own baselines by as much as 7% while significantly reducing communication and client-level computation costs.

URL: https://openreview.net/forum?id=vsZ5A3Zxyr

---

Title: Score-based Explainability for Graph Representations

Abstract: Despite the widespread use of unsupervised Graph Neural Networks (GNNs), their post-hoc explainability remains underexplored. Current graph explanation methods typically focus on explaining a single dimension of the final output. However, unsupervised and self-supervised GNNs produce d-dimensional representation vectors whose individual elements lack clear, disentangled semantic meaning. To tackle this issue, we draw inspiration from the success of score-based graph explainers in supervised GNNs and propose a novel framework, grXAI, for graph representation explainability. grXAI generalizes existing score-based graph explainers to identify the subgraph most responsible for constructing the latent representation of the input graph. This framework can be easily and efficiently implemented as a wrapper around existing methods, enabling the explanation of graph representations through connected subgraphs, which are more human-intelligible. Extensive qualitative and quantitative experiments demonstrate grXAI's strong ability to identify subgraphs that effectively explain learned graph representations across various unsupervised tasks and learning algorithms.

URL: https://openreview.net/forum?id=K6DKrrpYpJ

---

Reply all

Reply to author

Forward

0 new messages