Daily TMLR digest for Jun 25, 2024

1 view

Skip to first unread message

TMLR

unread,

Jun 25, 2024, 12:00:08 AM (6 days ago) Jun 25

to tmlr-anno...@googlegroups.com

New certifications
==================

Reproducibility Certification: On the Reproducibility of: "Learning Perturbations to Explain Time Series Predictions"

Wouter Bant, Ádám Divák, Jasper Eppink, Floris Six Dijkstra

https://openreview.net/forum?id=nPZgtpfgIx

---

Accepted papers
===============

Title: On the Reproducibility of: "Learning Perturbations to Explain Time Series Predictions"

Authors: Wouter Bant, Ádám Divák, Jasper Eppink, Floris Six Dijkstra

Abstract: Deep Learning models have taken the front stage in the AI community, yet explainability challenges hinder their widespread adoption. Time series models, in particular, lack attention in this regard. This study tries to reproduce and extend the work of Enguehard (2023b), focusing on time series explainability by incorporating learnable masks and perturbations. Enguehard (2023b) employed two methods to learn these masks and perturbations, the preservation game (yielding SOTA results) and the deletion game (with poor performance). We extend the work by revising the deletion game’s loss function, testing the robustness of the proposed method on a novel weather dataset, and visualizing the learned masks and perturbations. Despite notable discrepancies in results across many experiments, our findings demonstrate that the proposed method consistently outperforms all baselines and exhibits robust performance across datasets. However, visualizations for the preservation game reveal that the learned perturbations primarily resemble a constant zero signal, questioning the importance of learning perturbations. Nevertheless, our revised deletion game shows promise, recovering meaningful perturbations and, in certain instances, surpassing the performance of the preservation game.

URL: https://openreview.net/forum?id=nPZgtpfgIx

---

Title: Revealing an Overlooked Challenge in Class-Incremental Graph Learning

Authors: Daiqing Qi, Handong Zhao, Xiaowei Jia, Sheng Li

Abstract: Graph Neural Networks (GNNs), which effectively learn from static graph-structured data, become ineffective when directly applied to streaming data in a continual learning (CL) scenario. In CL, historical data are not available during the current stage due to a number of reasons, such as limited storage, GDPR1 data retention policy, to name a few. A few recent works study this problem, however, they overlook the uniqueness of continual graph learning (CGL), compared to well-studied continual image classification: the unavailability of previous training data further poses challenges to inference in CGL, in additional to the well-known catastrophic forgetting problem. While existing works make a strong assumption that full access of historical data is unavailable during training but provided during inference, which potentially contradicts the continual learning paradigm Van de Ven & Tolias (2019), we study continual graph learning without this strong and contradictory assumption. In this case, without being re-inserted into previous training graphs for inference, streaming test nodes are often more sparsely connected, which makes the inference more difficult due to insufficient neighborhood information. In this work, we propose ReplayGNN (ReGNN) to jointly solve the above two challenges without memory buffers: catastrophic forgetting and poor neighbor information during inference. Extensive experiments demonstrate the effectiveness of our model over baseline models and its effectiveness in different cases with different levels of neighbor information available.

URL: https://openreview.net/forum?id=ScAc73Y1oJ

---

Title: Selective Pre-training for Private Fine-tuning

Authors: Da Yu, Sivakanth Gopi, Janardhan Kulkarni, Zinan Lin, Saurabh Naik, Tomasz Lukasz Religa, Jian Yin, Huishuai Zhang

Abstract: Text prediction models, when used in applications like email clients or word processors, must protect user data privacy and adhere to model size constraints. These constraints are crucial to meet memory and inference time requirements, as well as to reduce inference costs. Building small, fast, and private domain-specific language models is a thriving area of research. In this work, we show that a careful pre-training on a subset of the public dataset that is guided by the private dataset is crucial to train small language models with differential privacy. On standard benchmarks, small models trained with our new framework achieve state-of-the-art performance. In addition to performance improvements, our results demonstrate that smaller models, through careful pre-training and private fine-tuning, can match the performance of much larger models that do not have access to private data. This underscores the potential of private learning for model compression and enhanced efficiency.

URL: https://openreview.net/forum?id=y3u8OpPHxz

---

New submissions
===============

Title: Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization

Abstract: We present Re-weighted Gradient Descent (RGD), a novel optimization technique that improves the performance of deep neural networks through dynamic sample re-weighting. Leveraging insights from distributionally robust optimization (DRO) with Kullback-Leibler divergence, our method dynamically assigns importance weights to training data during each optimization step. RGD is simple to implement, computationally efficient, and compatible with widely used optimizers such as SGD and Adam. We demonstrate the effectiveness of RGD on various learning tasks, including supervised learning, meta-learning, and out-of-domain generalization. Notably, RGD achieves state-of-the-art results on diverse benchmarks, with improvements of +0.7% on DomainBed, +1.44% on tabular classification, +1.94% on GLUE with BERT, and +1.01% on ImageNet-1K with ViT.

URL: https://openreview.net/forum?id=KCf5CLAXZq

---

Title: DIOMIX: A Dynamic Multi-Agent Reinforcement Learning Mixing Structure for Independent Intra-Option Learning

Abstract: In cooperative multi-agent reinforcement learning (MARL), agents are equipped with a formalism to plan, learn, and reason in diverse ways, enabling continual knowledge accumulation over time. Each agent must consistently learn within its environment and possess the ability to reason at various levels of both temporal and spatial abstraction to navigate the intricacies specific to its surroundings. Current state-of-the-art approaches explicitly rely on learning an objective function that harmonizes both planning and learning without explicitly relying on reasoning. We propose a distinctive framework, Dynamic Intra-Options Mixtures (DIOMIX), aiming to address the deficiency in reasoning capabilities present in current state-of-the-art algorithms. We introduce an agent-independent option-based framework, incorporating a notion of temporal abstraction into the MARL paradigm using an advantage-based learning scheme directly on the option policy. This scheme enables higher long-term utility retention compared to directly optimizing action-value functions themselves. However, using temporal difference learning could hinder the optimization of extended temporal actions; therefore, to mitigate this issue where options are optimized solely to execute as primitive actions, we incorporate a regularization mechanism into the learning process to enable options execution over extended periods. Through quantitative and qualitative empirical results, DIOMIX can acquire individually separable and explainable reasoning capabilities that lead to agent specialization, task simplification, and help with training efficiency. We achieve this by embedding their learning within an option-based framework without compromising performance.

URL: https://openreview.net/forum?id=IghGTYfMRt

---

Title: Towards Provable Log Density Policy Gradient

Abstract: Policy gradient methods are a vital ingredient behind the success of modern reinforcement learning. Modern policy gradient methods, although successful, introduce a residual error in gradient estimation. In this work, we argue that this residual term is significant and correcting for it could potentially improve sample-complexity of reinforcement learning methods. To that end, we propose log density gradient to estimate the policy gradient, which corrects for this residual error term. Log density gradient method computes policy gradient by utilising the state-action discounted distributional formulation. We first present the equations needed to exactly find the log density gradient for a tabular Markov Decision Processes (MDPs). For more complex environments, we propose a temporal difference (TD) method that approximates log density gradient by utilizing backward on-policy samples. Since backward sampling from a Markov chain is highly restrictive we also propose a min-max optimization that can approximate log density gradient using just on-policy samples. We also prove uniqueness, and convergence under linear function approximation, for this min-max optimization. Finally, we show that the sample complexity of our min-max optimization to be of the order of $m^{-1/2}$, where $m$ is the number of on-policy samples. We also demonstrate a proof-of-concept for our log density gradient method on gridworld environment, and observe that our method is able to improve upon the classical policy gradient method by a clear margin, thus indicating a promising novel direction to develop reinforcement learning algorithms that require fewer samples.

URL: https://openreview.net/forum?id=qIWazsRaTR

---

Title: Using Multimodal Foundation Models and Clustering for Improved Style Ambiguity Loss

Abstract: Teaching text-to-image models to be creative involves using style ambiguity loss, which
requires a pretrained classifier. In this work, we explore a new form of the style ambiguity
training objective, used to approximate creativity, that does not require training a classifier
or even a labeled dataset. We then train a diffusion model to maximize style ambiguity
to imbue the diffusion model with creativity and find our new methods improve upon the
traditional method, based on automated metrics for human judgment, while still maintaining
creativity and novelty.

URL: https://openreview.net/forum?id=GqG4IvRyNl

---

Title: The Real Tropical Geometry of Neural Networks

Abstract: We consider a binary classifier defined as the sign of a tropical rational function, that is, as the difference of two convex piecewise linear functions. In particular, we consider binary classifications through ReLU neural networks, whose parameter space is contained as a semialgebraic set inside the parameter space of tropical rational functions.
We initiate the study of two different subdivisions of this parameter space:
a subdivision into semialgebraic sets, on which the combinatorial type of the decision boundary is fixed, and a subdivision into a polyhedral fan, capturing the combinatorics of the partitions of the dataset. The sublevel sets of the $0/1$-loss function arise as subfans of this classification fan, and we show that the level-sets are not necessarily connected. We describe the classification fan i) geometrically, as normal fan of the activation polytope, and ii) combinatorially through a list of properties of associated bipartite graphs, in analogy to covector axioms of oriented matroids and tropical oriented matroids. Our findings extend and refine the connection between neural networks and tropical geometry by observing structures established in real tropical geometry, such as positive tropicalizations of hypersurfaces and tropical semialgebraic sets.

URL: https://openreview.net/forum?id=I7JWf8XA2w

---

Title: Concept-Driven Continual Learning

Abstract: This paper introduces novel solutions to the challenge of catastrophic forgetting in continual learning: Interpretability Guided Continual Learning (IG-CL) and Intrinsically Interpretable Neural Network (IN2). These frameworks bring interpretability into continual learning, systematically managing human-understandable concepts within neural network models to enhance knowledge retention from previous tasks. Our methods are designed to enhance interpretability, providing transparency and control over the continual training process. While our primary focus is to provide a new framework to design continual learning algorithms based on interpretability instead of improving performance, we observe that our methods often surpass existing ones: IG-CL employs interpretability tools to guide neural networks, showing an improvement of up to 1.4% in average incremental accuracy over existing methods; IN2, inspired by the Concept Bottleneck Model, adeptly adjusts concept units for both new and existing tasks, reducing average incremental forgetting by up to 9.1%. Both frameworks demonstrate superior performance compared to exemplar-free methods and are competitive with exemplar-based methods. When combined with exemplar-based strategies, they further improve the performance by up to 18%. These advancements represent a significant step in addressing the limitations of current continual learning methods, offering efficient and interpretable approaches that do not require additional memory for past data.

URL: https://openreview.net/forum?id=HSW49uvCNW

---

Title: Geometric Analysis of Transformer Time Series Forecasting Latent Manifolds

Abstract: Transformer models have consistently achieved remarkable results in various domains such as natural language processing and computer vision. However, despite ongoing research efforts to better understand these models, they still lack a comprehensive understanding. This is particularly true for deep time series forecasting methods, where analysis and understanding work is relatively limited. Time series data, unlike image and text information, can be more challenging to interpret and analyze. To address this, we approach the problem from a \emph{manifold learning} perspective, assuming that the latent representations of time series forecasting models lie next to a low-dimensional manifold. In our study, we focus on analyzing the geometric features of these latent data manifolds, including intrinsic dimension and principal curvatures. Our findings reveal that deep transformer models exhibit similar geometric behavior across layers, and these geometric features are correlated with model performance. Additionally, we observe that untrained models initially have different structures, but they rapidly converge during training.
By leveraging our geometric analysis and differentiable tools, we can potentially design new and improved deep forecasting neural networks. This approach complements existing analysis studies and contributes to a better understanding of transformer models in the context of time series forecasting.

URL: https://openreview.net/forum?id=zRZe93OZho

---

Title: Are EEG Sequences Time Series? EEG Classification with Time Series Models and Joint Subject Training

Abstract: As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for each individual subject are learned, not one model for all of them. In this paper, we systematically study the differences between EEG classification models and generic time series classification models. We describe three different model setups to deal with EEG data from different subjects, namely subject-specific models (most EEG literature), subject-agnostic models and subject-conditional models. In experiments on three datasets, we demonstrate that off-the-shelf time series classification models trained per subject perform close to EEG classification models, but that do not quite reach the performance of domain-specific modeling. Additionally, we combine time-series models with subject embeddings to train one joint subject-conditional classifier on all subjects. The resulting models are competitive with dedicated EEG models in 2 out of 3 datasets, even outperforming all EEG methods on one of them.

URL: https://openreview.net/forum?id=yEqzW4NdYh

---

Title: AdaWaveNet: Adaptive Wavelet Network for Time Series Analysis

Abstract: Time series data analysis is a critical component in various domains such as finance, healthcare, and meteorology. Despite the progress in deep learning for time series analysis, there remains a challenge in addressing the non-stationary nature of time series data. Most of the existing models, which are built on the assumption of constant statistical properties over time, often struggle to capture the temporal dynamics in realistic time series and result in bias and error in time series analysis. This paper introduces the Adaptive Wavelet Network (AdaWaveNet), a novel approach that employs Adaptive Wavelet Transformation for multi-scale analysis of non-stationary time series data. AdaWaveNet designed a lifting scheme-based wavelet decomposition and construction mechanism for adaptive and learnable wavelet transforms, which offers enhanced flexibility and robustness in analysis. We conduct extensive experiments on 10 datasets across 3 different tasks, including forecasting, imputation, and a newly established super-resolution task. The evaluations demonstrate the effectiveness of AdaWaveNet over existing methods in all three tasks, which illustrates its potential in various real-world applications.

URL: https://openreview.net/forum?id=m4bE9Y9FlX

---

Title: Hybrid Regularization Methods Achieve Near-Optimal Regularization in Random Feature Models

Abstract: We demonstrate the potential of hybrid regularization methods to automatically and efficiently regularize the training of random feature models to generalize well on unseen data. Hybrid methods automatically combine the strengths of early stopping and weight decay while avoiding their respective weaknesses. By iteratively projecting the original learning problem onto a lower-dimensional subspace, they provide an efficient way to choose the weight decay hyperparameter. In our work, the weight decay hyperparameter is automatically selected by generalized cross-validation (GCV), which performs leave-one-out cross-validation simultaneously in a single training run and without the need for a dedicated validation dataset. As a demonstration, we use the random feature model to generate well- and ill-posed training problems arising from image classification. Our results show that hybrid regularization leads to near-optimal regularization in all problems. In particular, it is competitive with optimally tuned classical regularization methods. While hybrid regularization methods are popular in many large-scale inverse problems, their potential in machine learning is under-appreciated, and our findings motivate their wider use.

URL: https://openreview.net/forum?id=ayNuSG60LW

---

Title: Linear Convergence of Decentralized FedAvg for PL Objectives: The Interpolation Regime

Abstract: Federated Learning (FL) is a distributed learning paradigm where multiple clients each having access to a local dataset collaborate to solve a joint problem. Federated Averaging (FedAvg) the algorithm of choice has been widely explored in the {\em Centralized} setting where the server coordinates the information sharing among clients. However, this approach incurs high communication cost and if the central server fails then the complete system fails. Hence, there is a need to study the performance of FedAvg in the {\em decentralized} setting, which is not very well understood, especially in the interpolation regime, a common phenomenon observed in modern overparameterized neural networks. In this work, we address this challenge and perform a thorough theoretical performance analysis of FedAvg in the interpolation regime under {\em Decentralized} setting, where only the neighboring clients communicate depending on the network topology. We consider a class of non-convex functions satisfying the Polyak-{\L}ojasiewicz (PL) inequality, a condition satisfied by overparameterized neural networks. For the first time, we establish that {\em Decentralized} FedAvg achieves linear convergence rates of $\mathcal{O}({T^2} \log ({1}/{\epsilon}))$, where $\epsilon$ is the solution accuracy, and $T$ is the number of local updates at each client. In contrast to the standard {\em Decentralized} FedAvg analyses, our work does not require bounded heterogeneity and gradient assumptions. Instead, we show that sample-wise (and local) smoothness of the local objectives suffice to capture the effect of heterogeneity. Experiments on multiple real datasets corroborate our theoretical findings.

URL: https://openreview.net/forum?id=Og3VxBFhwj

---

Reply all

Reply to author

Forward

0 new messages