Daily TMLR digest for Oct 28, 2022

1 view

Skip to first unread message

TMLR

unread,

Oct 27, 2022, 8:00:07 PM10/27/22

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Time Series Alignment with Global Invariances

Authors: Titouan Vayer, Romain Tavenard, Laetitia Chapel, Rémi Flamary, Nicolas Courty, Yann Soullard

Abstract: Multivariate time series are ubiquitous objects in signal processing. Measuring a distance or similarity between two such objects is of prime interest in a variety of applications, including machine learning, but can be very difficult as soon as the temporal dynamics and the representation of the time series, i.e. the nature of the observed quantities, differ from one another. In this work, we propose a novel distance accounting both feature space and temporal variabilities by learning a latent global transformation of the feature space together with a temporal alignment, cast as a joint optimization problem. The versatility of our framework allows for several variants depending on the invariance class at stake. Among other contributions, we define a differentiable loss for time series and present two algorithms for the computation of time series barycenters under this new geometry. We illustrate the interest of our approach on both simulated and real world data and show the robustness of our approach compared to state-of-the-art methods.

URL: https://openreview.net/forum?id=JXCH5N4Ujy

---

Title: Explicit Group Sparse Projection with Applications to Deep Learning and NMF

Authors: Riyasat Ohib, Nicolas Gillis, Niccolo Dalmasso, Sameena Shah, Vamsi K. Potluru, Sergey Plis

Abstract: We design a new sparse projection method for a set of vectors that guarantees a desired average sparsity level measured leveraging the popular Hoyer measure (an affine function of the ratio of the $\ell_1$ and $\ell_2$ norms).
Existing approaches either project each vector individually or require the use of a regularization parameter which implicitly maps to the average $\ell_0$-measure of sparsity. Instead, in our approach we set the sparsity level for the whole set explicitly and simultaneously project a group of vectors with the sparsity level of each vector tuned automatically.
We show that the computational complexity of our projection operator is linear in the size of the problem.
Additionally, we propose a generalization of this projection by replacing the $\ell_1$ norm by its weighted version.
We showcase the efficacy of our approach in both supervised and unsupervised learning tasks on image datasets including CIFAR10 and ImageNet. In deep neural network pruning, the sparse models produced by our method on ResNet50 have significantly higher accuracies at corresponding sparsity values compared to existing competitors. In nonnegative matrix factorization, our approach yields competitive reconstruction errors against state-of-the-art algorithms.

URL: https://openreview.net/forum?id=jIrOeWjdpc

---

New submissions
===============

Title: Proportional Fairness in Federated Learning

Abstract: With the increasingly broad deployment of federated learning (FL) systems in the real world, it is critical but challenging to ensure fairness in FL, i.e. reasonably satisfactory performances for each of the numerous diverse clients. In this work, we introduce and study a new fairness notion in FL, called proportional fairness (PF), which is based on the relative change of each client's performance. From its connection with the bargaining games, we propose PropFair, a novel and easy-to-implement algorithm for finding proportionally fair solutions in FL, and study its convergence properties. Through extensive experiments on vision and language datasets, we demonstrate that PropFair can approximately find PF solutions, and it achieves a good balance between the average performances of all clients and of the worst 10% clients.

URL: https://openreview.net/forum?id=ryUHgEdWCQ

---

Title: GSR: A Generalized Symbolic Regression Approach

Abstract: Identifying the mathematical relationships that best describe a dataset remains a very challenging problem in machine learning, and is known as Symbolic Regression (SR). In contrast to neural networks which are often treated as black boxes, SR attempts to gain insight into the underlying relationships between the independent variables and the target variable of a given dataset by assembling analytical functions. In this paper, we present GSR, a Generalized Symbolic Regression approach, by modifying the conventional SR optimization problem formulation, while keeping the main SR objective intact. In GSR, we infer mathematical relationships between the independent variables and some transformation of the target variable. We constrain our search space to a weighted sum of basis functions, and propose a genetic programming approach with a matrix-based encoding scheme. We show that our GSR method outperforms several state-of-the-art methods on the well-known SR benchmark problem sets. Finally, we highlight the strengths of GSR by introducing SymSet, a new SR benchmark set which is more challenging relative to the existing benchmarks.

URL: https://openreview.net/forum?id=lheUXtDNvP

---

Title: Attention as Inference via Fenchel Duality

Abstract: Attention has been widely adopted in many state-of-the-art deep learning models. While the significant performance improvements it brings have attracted great interest, attention is still poorly understood theoretically. This paper presents a new perspective to understand attention by showing that it can be seen as a solver of a family of estimation problems. In particular, we describe a convex optimization problem that arises in a family of estimation tasks commonly appearing in the design of deep learning models. Rather than directly solving the convex optimization problem, we solve its Fenchel dual and derive a closed-form approximation of the optimal solution. Remarkably, the solution gives a generalized attention structure, and its special case is equivalent to the popular dot-product attention adopted in transformer networks. We show that T5 transformer has implicitly adopted the general form of the solution by demonstrating that this expression unifies the word mask and the positional encoding functions. Finally, we discuss how the proposed attention structures can be integrated in practical models and empirically show that the convex optimization problem indeed provides a principle justifying the attention module design.

URL: https://openreview.net/forum?id=XtL7cM4fQy

---

Title: Detecting Anomalies within Time Series using Local Neural Transformations

Abstract: We develop a new method to detect anomalies within time series, which is essential in many
application domains, reaching from self-driving cars, finance, and marketing to medical
diagnosis and epidemiology. The method is based on self-supervised deep learning that has
played a key role in facilitating deep anomaly detection on images, where powerful image
transformations are available. However, such transformations are widely unavailable for time
series. Addressing this, we develop Local Neural Transformations (LNT), a method learning
local transformations of time series from data. The method produces an anomaly score for
each time step and thus can be used to detect anomalies within time series. We prove in
a theoretical analysis that our novel training objective is more suitable for transformation
learning than previous deep Anomaly detection (AD) methods. Our experiments demonstrate
that LNT can find anomalies in speech segments from the LibriSpeech data set and better
detect interruptions to cyber-physical systems than previous work. Visualization of the
learned transformations gives insight into the type of transformations that LNT learns.

URL: https://openreview.net/forum?id=p6xslUyvka

---

Title: Direct Neural Network Training on Securely Encoded Datasets

Abstract: In fields where data privacy and secrecy are critical, such as healthcare and business intelligence, security concerns have limited data availability for neural network training. A recently developed technique securely encodes training, test, and inference examples with an aggregate non-orthogonal and nonlinear transformation that consists of steps of padding, perturbation, and orthogonal transformation, enabling artificial neural network (ANN) training and inference directly on encoded datasets. Here, the performance characteristics of the various aspects of the method are presented. The individual transformations of the method, when applied alone, do not significantly reduce validation accuracy. Training on datasets transformed by sequential padding, perturbation, and orthogonal transformation results in only slightly lower validation accuracies than those seen with unmodified control datasets (relative decreases in accuracy of 0.15% to 0.35%), with no difference in training time seen between transformed and control datasets. The presented methods have broad implications for machine learning in fields requiring data security.

URL: https://openreview.net/forum?id=pP0ABGeLe9

---

Title: Reinforcement Teaching

Abstract: Machine learning algorithms learn to solve a task, but are unable to improve their ability to learn.
Meta-learning methods learn about machine learning algorithms and improve them so that they learn more quickly. However, existing meta-learning methods are either hand-crafted to improve one specific component of an algorithm or only work with differentiable algorithms.
We develop a unifying meta-learning framework, called \textit{Reinforcement Teaching}, to improve the learning process of \emph{any} algorithm. Under Reinforcement Teaching, a teaching policy is learned, through reinforcement, to improve a student's learning algorithm. To learn an effective teaching policy, we introduce the \textit{parametric-behavior embedder} that learns a representation of the student's learnable parameters from its input/output behavior. We further use \textit{learning progress} to shape the teacher's reward, allowing it to more quickly maximize the student's performance. To demonstrate the generality of Reinforcement Teaching, we conduct experiments in which a teacher learns to significantly improve both reinforcement and supervised learning algorithms. Reinforcement Teaching outperforms previous work using heuristic reward functions and state representations, as well as other parameter representations.

URL: https://openreview.net/forum?id=G2GKiicaJI

---

Title: Computationally-efficient initialisation of GPs: The generalised variogram method

Abstract: We present a computationally-efficient strategy to find the hyperparameters of a Gaussian process (GP) avoiding the computation of the likelihood function. The found hyperparameters can then be used directly for regression or passed as initial conditions to maximum-likelihood (ML) training. Motivated by the fact that training a GP via ML is equivalent (on average) to minimising the KL-divergence between the true and learnt model, we set to explore different metrics/divergences among GPs that are computationally inexpensive and provide estimates close to those of ML. In particular, we identify the GP hyperparameters by matching the empirical covariance to a parametric candidate, proposing and studying various measures of discrepancy. Our proposal extends the Variogram method developed by the geostatistics literature and thus is referred to as the Generalised Variogram method (GVM). In addition to the theoretical presentation of GVM, we provide experimental validation in terms of accuracy, consistency with ML and computational complexity for different kernels using synthetic and real-world data.

URL: https://openreview.net/forum?id=slsAQHpS7n

---

Reply all

Reply to author

Forward

0 new messages