Weekly TMLR digest for May 21, 2023

8 views
Skip to first unread message

TMLR

unread,
May 20, 2023, 8:00:09 PM5/20/23
to tmlr-annou...@googlegroups.com


New certifications
==================

Survey Certification: Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training

Utku Ozbulak, Hyun Jung Lee, Beril Boga, Esla Timothy Anzaku, Ho-min Park, Arnout Van Messem, Wesley De Neve, Joris Vankerschaver

https://openreview.net/forum?id=Ma25S4ludQ

---


Featured Certification: Attacking Perceptual Similarity Metrics

Abhijay Ghildyal, Feng Liu

https://openreview.net/forum?id=r9vGSpbbRO

---


Accepted papers
===============


Title: Deep Plug-and-Play Clustering with Unknown Number of Clusters

Authors: An Xiao, Hanting Chen, Tianyu Guo, QINGHUA ZHANG, Yunhe Wang

Abstract: Clustering is an essential task for the purpose that data points can be classified in an unsupervised manner. Most deep clustering algorithms are very effective when given the number of clusters K. However, when K is unknown, finding the appropriate K for these algorithms can be computationally expensive via model-selection criteria, and applying algorithms with an inaccurate K can hardly achieve the state-of-the-art performance. This paper proposes a plug-and-play clustering module to automatically adjust the number of clusters, which can be easily embedded into existing deep parametric clustering methods. By analyzing the goal of clustering, a split-and-merge framework is introduced to reduce the intra-class diversity and increase the inter-class difference, which leverages the entropy between different clusters. Specifically, given an initial clustering number, clusters can be split into sub-clusters or merged into super-clusters and converge to a stable number of K clusters at the end of training. Experiments on benchmark datasets demonstrate that the proposed method can achieve comparable performance with the state-of-the-art works without requiring the number of clusters.


URL: https://openreview.net/forum?id=6rbcq0qacA

---

Title: When to Trust Aggregated Gradients: Addressing Negative Client Sampling in Federated Learning

Authors: Wenkai Yang, Yankai Lin, Guangxiang Zhao, Peng Li, Jie Zhou, Xu Sun

Abstract: Federated Learning has become a widely-used framework which allows learning a global model on decentralized local datasets under the condition of protecting local data privacy. However, federated learning faces severe optimization difficulty when training samples are not independently and identically distributed (non-i.i.d.). In this paper, we point out that the client sampling practice plays a decisive role in the aforementioned optimization difficulty. We find that the negative client sampling will cause the merged data distribution of currently sampled clients heavily inconsistent with that of all available clients, and further make the aggregated gradient unreliable. To address this issue, we propose a novel learning rate adaptation mechanism to adaptively adjust the server learning rate for the aggregated gradient in each round, according to the consistency between the merged data distribution of currently sampled clients and that of all available clients. Specifically, we make theoretical deductions to find a meaningful and robust indicator that is positively related to the optimal server learning rate, which is supposed to minimize the Euclidean distance between the aggregated gradient given currently sampled clients and that if all clients could participate in the current round. We show that our proposed indicator can effectively reflect the merged data distribution of sampled clients, thus we utilize it for the server learning rate adaptation. Extensive experiments on multiple image and text classification tasks validate the great effectiveness of our method in various settings. Our code is available at https://github.com/lancopku/FedGLAD.

URL: https://openreview.net/forum?id=v73h3bYE2Z

---

Title: A Measure of the Complexity of Neural Representations based on Partial Information Decomposition

Authors: David Alexander Ehrlich, Andreas Christian Schneider, Viola Priesemann, Michael Wibral, Abdullah Makkeh

Abstract: In neural networks, task-relevant information is represented jointly by groups of neurons. However, the specific way in which this mutual information about the classification label is distributed among the individual neurons is not well understood: While parts of it may only be obtainable from specific single neurons, other parts are carried redundantly or synergistically by multiple neurons. We show how Partial Information Decomposition (PID), a recent extension of information theory, can disentangle these different contributions. From this, we introduce the measure of ``Representational Complexity'', which quantifies the difficulty of accessing information spread across multiple neurons. We show how this complexity is directly computable for smaller layers. For larger layers, we propose subsampling and coarse-graining procedures and prove corresponding bounds on the latter. Empirically, for quantized deep neural networks solving the MNIST and CIFAR10 tasks, we observe that representational complexity decreases both through successive hidden layers and over training, and compare the results to related measures. Overall, we propose representational complexity as a principled and interpretable summary statistic for analyzing the structure and evolution of neural representations and complex systems in general.

URL: https://openreview.net/forum?id=R8TU3pfzFr

---

Title: Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training

Authors: Utku Ozbulak, Hyun Jung Lee, Beril Boga, Esla Timothy Anzaku, Ho-min Park, Arnout Van Messem, Wesley De Neve, Joris Vankerschaver

Abstract: Although supervised learning has been highly successful in improving the state-of-the-art in the domain of image-based computer vision in the past, the margin of improvement has diminished significantly in recent years, indicating that a plateau is in sight. Meanwhile, the use of self-supervised learning (SSL) for the purpose of natural language processing (NLP) has seen tremendous successes during the past couple of years, with this new learning paradigm yielding powerful language models. Inspired by the excellent results obtained in the field of NLP, self-supervised methods that rely on clustering, contrastive learning, distillation, and information-maximization, which all fall under the banner of discriminative SSL, have experienced a swift uptake in the area of computer vision. Shortly afterwards, generative SSL frameworks that are mostly based on masked image modeling, complemented and surpassed the results obtained with discriminative SSL. Consequently, within a span of three years, over $100$ unique general-purpose frameworks for generative and discriminative SSL, with a focus on imaging, were proposed. In this survey, we review a plethora of research efforts conducted on image-oriented SSL, providing a historic view and paying attention to best practices as well as useful software packages. While doing so, we discuss pretext tasks for image-based SSL, as well as techniques that are commonly used in image-based SSL. Lastly, to aid researchers who aim at contributing to image-focused SSL, we outline a number of promising research directions.

URL: https://openreview.net/forum?id=Ma25S4ludQ

---

Title: Trip-ROMA: Self-Supervised Learning with Triplets and Random Mappings

Authors: Wenbin Li, Xuesong Yang, Meihao Kong, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo

Abstract: Contrastive self-supervised learning (SSL) methods, such as MoCo and SimCLR, have achieved great success in unsupervised visual representation learning. They rely on a large number of negative pairs and thus require either large memory banks or large batches. Some recent non-contrastive SSL methods, such as BYOL and SimSiam, attempt to discard negative pairs and have also shown remarkable performance. To avoid collapsed solutions caused by not using negative pairs, these methods require non-trivial asymmetry designs. However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all. To address this situation, we argue that negative pairs are still important but one is generally sufficient for each positive pair. We show that a simple Triplet-based loss (Trip) can achieve surprisingly good performance without requiring large batches or asymmetry designs. Moreover, to alleviate the over-fitting problem in small data regimes and further enhance the effect of Trip, we propose a simple plug-and-play RandOm MApping (ROMA) strategy by randomly mapping samples into other spaces and requiring these randomly projected samples to satisfy the same relationship indicated by the triplets. Integrating the triplet-based loss with random mapping, we obtain the proposed method Trip-ROMA. Extensive experiments, including unsupervised representation learning and unsupervised few-shot learning, have been conducted on ImageNet-1K and seven small datasets. They successfully demonstrate the effectiveness of Trip-ROMA and consistently show that ROMA can further effectively boost other SSL methods. Code is available at https://github.com/WenbinLee/Trip-ROMA.

URL: https://openreview.net/forum?id=MR4glug5GU

---

Title: Attacking Perceptual Similarity Metrics

Authors: Abhijay Ghildyal, Feng Liu

Abstract: Perceptual similarity metrics have progressively become more correlated with human judgments on perceptual similarity; however, despite recent advances, the addition of an imperceptible distortion can still compromise these metrics. In our study, we systematically examine the robustness of these metrics to imperceptible adversarial perturbations. Following the two-alternative forced-choice experimental design with two distorted images and one reference image, we perturb the distorted image closer to the reference via an adversarial attack until the metric flips its judgment. We first show that all metrics in our study are susceptible to perturbations generated via common adversarial attacks such as FGSM, PGD, and the One-pixel attack. Next, we attack the widely adopted LPIPS metric using spatial-transformation-based adversarial perturbations (stAdv) in a white-box setting to craft adversarial examples that can effectively transfer to other similarity metrics in a black-box setting. We also combine the spatial attack stAdv with PGD ($\ell_\infty$-bounded) attack to increase transferability and use these adversarial examples to benchmark the robustness of both traditional and recently developed metrics. Our benchmark provides a good starting point for discussion and further research on the robustness of metrics to imperceptible adversarial perturbations.

URL: https://openreview.net/forum?id=r9vGSpbbRO

---

Title: Conditional Permutation Invariant Flows

Authors: Berend Zwartsenberg, Adam Scibior, Matthew Niedoba, Vasileios Lioutas, Justice Sefas, Yunpeng Liu, Setareh Dabiri, Jonathan Wilder Lavington, Trevor Campbell, Frank Wood

Abstract: We present a conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including (1) complex traffic scene generation conditioned on visually specified map information, and (2) object bounding box generation conditioned directly on images. We train our model by maximizing the expected likelihood of labeled conditional data under our flow, with the aid of a penalty that ensures the dynamics are smooth and hence efficiently solvable. Our method significantly outperforms non-permutation invariant baselines in terms of log likelihood and domain-specific metrics (offroad, collision, and combined infractions), yielding realistic samples that are difficult to distinguish from data.

URL: https://openreview.net/forum?id=DUsgPi3oCC

---

Title: Event Tables for Efficient Experience Replay

Authors: Varun Raj Kompella, Thomas Walsh, Samuel Barrett, Peter R. Wurman, Peter Stone

Abstract: Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems.
However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic
behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions
an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We
prove a theoretical advantage over the traditional monolithic buffer approach and combine SSET with
an existing prioritized sampling strategy to further improve learning speed and stability. Empirical
results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing
simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling

URL: https://openreview.net/forum?id=XejzjAjKjv

---

Title: Agent-State Construction with Auxiliary Inputs

Authors: Ruo Yu Tao, Adam White, Marlos C. Machado

Abstract: In many, if not every realistic sequential decision-making task, the decision-making agent is not able to model the full complexity of the world. The environment is often much larger and more complex than the agent, a setting also known as partial observability. In such settings, the agent must leverage more than just the current sensory inputs; it must construct an agent state that summarizes previous interactions with the world. Currently, a popular approach for tackling this problem is to learn the agent-state function via a recurrent network from the agent's sensory stream as input. Many impressive reinforcement learning applications have instead relied on environment-specific functions to aid the agent's inputs for history summarization. These augmentations are done in multiple ways, from simple approaches like concatenating observations to more complex ones such as uncertainty estimates. Although ubiquitous in the field, these additional inputs, which we term auxiliary inputs, are rarely emphasized, and it is not clear what their role or impact is. In this work we explore this idea further, and relate these auxiliary inputs to prior classic approaches to state construction. We present a series of examples illustrating the different ways of using auxiliary inputs for reinforcement learning. We show that these auxiliary inputs can be used to discriminate between observations that would otherwise be aliased, leading to more expressive features that smoothly interpolate between different states. Finally, we show that this approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation through time, and acts as a heuristic that facilitates longer temporal credit assignment, leading to better performance.

URL: https://openreview.net/forum?id=RLYkyucU6k

---

Title: Modelling sequential branching dynamics with a multivariate branching Gaussian process

Authors: Elvijs Sarkans, Sumon Ahmed, Magnus Rattray, Alexis Boukouvalas

Abstract: The Branching Gaussian Process (BGP) model is a modification of the Overlapping Mixture
of Gaussian Processes (OMGP) where latent functions branch in time. The BGP model
was introduced as a method to model bifurcations in single-cell gene expression data and
order genes by inferring their branching time parameter. A limitation of the current BGP
model is that the assignment of observations to latent functions is inferred independently
for each output dimension (gene). This leads to inconsistent assignments across outputs
and reduces the accuracy of branching time inference. Here, we propose a multivariate
branching Gaussian process (MBGP) model to perform joint branch assignment inference
across multiple output dimensions. This ensures that branch assignments are consistent and
leverages more data for branching time inference. Model inference is more challenging than
for the original BGP or OMGP models because assignment labels can switch from trunk to
branch lineages as branching times change during inference. To scale up inference to large
datasets we use sparse variational Bayesian inference. We examine the effectiveness of our
approach on synthetic data and a single-cell RNA-Seq dataset from mouse haematopoietic
stem cells (HSCs). Our approach ensures assignment consistency by design and achieves
improved accuracy in branching time inference and assignment accuracy.

URL: https://openreview.net/forum?id=9KoBOlstTq

---


New submissions
===============


Title: AP: Selective Activation for De-sparsifying Pruned Networks

Abstract: The rectified linear unit (ReLU) is a highly successful activation function in neural networks as it allows networks to easily obtain sparse representations, which reduces overfitting in overparameterized networks. However, in the context of network pruning, we find that the sparsity introduced by ReLU, which we quantify by a term called dynamic dead neuron rate (DNR), is not beneficial for the pruned network. Interestingly, the more the network is pruned, the smaller the dynamic DNR becomes during and after optimization. This motivates us to propose a method to explicitly reduce the dynamic DNR for the pruned network, i.e., de-sparsify the network. We refer to our method as Activate-while-Pruning (AP). We note that AP does not function as a stand-alone method, as it does not evaluate the importance of weights. Instead, it works in tandem with existing pruning methods and aims to improve their performance by selective activation of nodes to reduce the dynamic DNR. We conduct extensive experiments using various popular networks (e.g., ResNet, VGG, DenseNet, MobileNet) via two classical and three state-of-the-art pruning methods. The experimental results on public datasets (e.g., CIFAR-10, CIFAR-100) suggest that AP works well with existing pruning methods and improves the performance by 3% - 4%. For larger scale datasets (e.g., ImageNet) and state-of-the-art networks (e.g., vision transformer), we observe an improvement of 2% - 3% with AP as opposed to without. Lastly, we conduct an ablation study to examine the effectiveness of the components comprising AP.

URL: https://openreview.net/forum?id=EGQSpkUDdD

---

Title: Learned Thresholds Token Merging and Pruning for Vision Transformers

Abstract: Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years, however, their high computational costs remains a significant barrier to their practical deployment.
In particular, the complexity of transformer models is quadratic with respect to the number of input tokens.
Therefore techniques that reduce the number of input tokens that need to be processed have been proposed.
This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning.
LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune.
We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task.
Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods.

URL: https://openreview.net/forum?id=WYKTCKpImz

---

Title: Communication Efficient Federated Learning over Wireless Channels using Robust Count Sketches

Abstract: Large-scale federated learning (FL) over wireless multiple access channels (MACs) has
emerged as a crucial learning paradigm with a wide range of applications. However, its
widespread adoption is hindered by several major challenges, including limited bandwidth
shared by many edge devices, noisy and erroneous wireless communications, and heterogeneous
datasets with different distributions across edge devices. To overcome these fundamental
challenges, we propose Federated Proximal Sketching (FPS), a novel federated learning
algorithm specifically designed for noisy and bandlimited wireless environments. FPS uses
a count sketch data structure to address the bandwidth bottleneck and enable efficient compression
while maintaining accurate estimation of significant coordinates. Moreover, FPS is
designed to explicitly address the bias induced by communications over noisy wireless channels.
We establish the convergence of the FPS algorithm under mild technical conditions. It
is worth noting that FPS is able to handle high levels of data heterogeneity across edge devices.
We complement the proposed theoretical framework with extensive experiments that
demonstrate the stability, accuracy, and efficiency of FPS in comparison to state-of-the-art
methods on both synthetic and real-world datasets. Overall, our results show that FPS is a
promising solution to tackling the above challenges of FL over wireless MACs.

URL: https://openreview.net/forum?id=RYwWr4gbQ1

---

Title: Cross-validation for Geospatial Data: Estimating Generalization Performance in Geostatistical Problems

Abstract: Geostatistical learning problems are frequently characterized by spatial autocorrelation in the input features and/or the potential for covariate shift at test time. These realities violate the classical assumption of independent, identically distributed data, upon which most cross-validation algorithms rely in order to estimate the generalization performance of a model. In this paper, we present a theoretical criterion for unbiased cross-validation estimators in the geospatial setting. We also introduce a new cross-validation algorithm to evaluate models, inspired by the challenges of geospatial problems. We apply a framework for categorizing problems into different types of geospatial scenarios to help practitioners select an appropriate cross-validation strategy. Our empirical analyses compare cross-validation algorithms on both simulated and several real datasets to develop recommendations for a variety of geospatial settings. This paper aims to draw attention to some challenges that arise in model evaluation for geospatial problems and to provide guidance for users.

URL: https://openreview.net/forum?id=VgJhYu7FmQ

---

Title: Accelerating Fair Federated Learning: Adaptive Federated Adam

Abstract: Federated learning is a distributed and privacy-preserving approach to train a statistical model collaboratively from decentralized data of different parties. However, when datasets of participants are not independent and identically distributed, models trained by naive federated algorithms may be biased towards certain participants, and model performance across participants is non-uniform. This is known as the fairness problem in federated learning. In this paper, we formulate fairness-controlled federated learning as a dynamical multi-objective optimization problem to ensure fair performance across all participants. To solve the problem efficiently, we study the convergence and bias of Adam as the server optimizer in federated learning, and propose Adaptive Federated Adam (AdaFedAdam) to accelerate fair federated learning with alleviated bias. We validated the effectiveness, Pareto optimality and robustness of AdaFedAdam with numerical experiments and show that AdaFedAdam outperforms existing algorithms, providing better convergence and fairness properties of the federated scheme.

URL: https://openreview.net/forum?id=xSPrjsdhvF

---

Title: Task Weighting in Meta-learning with Trajectory Optimisation

Abstract: Developing meta-learning algorithms that are un-biased toward a subset of training tasks often requires hand-designed criteria to weight tasks, potentially resulting in sub-optimal solutions. In this paper, we introduce a new principled and fully-automated task-weighting algorithm for meta-learning methods. By considering the weights of tasks within the same mini-batch as an action, and the meta-parameter of interest as the system state, we cast the task-weighting meta-learning problem to a trajectory optimisation and employ the iterative linear quadratic regulator to determine the optimal action or weights of tasks. We theoretically show that the proposed algorithm converges to an $\epsilon_{0}$-stationary point, and empirically demonstrate that the proposed approach out-performs common hand-engineering weighting methods in two few-shot learning benchmarks.

URL: https://openreview.net/forum?id=SSkTBUyJip

---

Title: Sample Average Approximation for Black-Box VI

Abstract: We present a novel approach for black-box VI that bypasses the difficulties of stochastic gradient ascent, including the task of selecting step-sizes.
Our approach involves using a sequence of sample average approximation (SAA) problems.
SAA approximates the solution of stochastic optimization problems by transforming them into deterministic ones.
We use quasi-Newton methods and line search to solve each deterministic optimization problem and present a heuristic policy to automate hyperparameter selection.
Our experiments show that our method simplifies the VI problem and achieves faster performance than existing methods.

URL: https://openreview.net/forum?id=Lvg10LZ5nL

---

Title: Meta-Learning via Classifier(-free) Diffusion Guidance

Abstract: We introduce meta-learning algorithms that perform zero-shot weight-space adaptation of neural network models to unseen tasks. Our methods repurpose the popular generative image synthesis techniques of natural language guidance and diffusion models to generate neural network weights adapted for tasks. We first train an unconditional generative hypernetwork model to produce neural network weights; then we train a second "guidance" model that, given a natural language task description, traverses the hypernetwork latent space to find high-performance task-adapted weights in a zero-shot manner. We explore two alternative approaches for latent space guidance: "HyperCLIP"-based classifier guidance and a conditional Hypernetwork Latent Diffusion Model ("HyperLDM"), which we show to benefit from the classifier-free guidance technique common in image generation. Finally, we demonstrate that our approaches outperform existing multi-task and meta-learning methods in a series of zero-shot learning experiments on our Meta-VQA dataset.

URL: https://openreview.net/forum?id=1irVjE7A3w

---

Title: On Equivalences between Weight and Function-Space Langevin Dynamics

Abstract: Approximate inference for overparameterized Bayesian models appears challenging, due to the complex structure of the posterior. To address this issue, a recent line of work has investigated the possibility of directly conducting approximate inference in the ``function space'', the space of prediction functions. This paper provides an alternative perspective to this problem, by showing that for many models – including a simplified neural network model – Langevin dynamics in the overparameterized ``weight space'' induces equivalent function-space trajectories to certain Langevin dynamics procedures in function space. Thus, the former can already be viewed as a function-space inference algorithm, with its convergence unaffected by overparameterization. We provide simulations on Bayesian neural network models and discuss the implication of the results.

URL: https://openreview.net/forum?id=zMVp33Gz8C

---

Title: CAE v2: Context Autoencoder with CLIP Latent Alignment

Abstract: Masked image modeling (MIM) learns visual representations by predicting the masked patches on a pre-defined target. Inspired by MVP that displays impressive gains with CLIP, in this work, we also employ the semantically rich CLIP latent as target and further tap its potential by introducing a new pipeline, CAE v2. CAE v2 is an improved variant of CAE, applying the CLIP latent on two pretraining tasks, i.e., visible latent alignment and masked latent alignment. Visible latent alignment directly mimics the visible latent representations from the encoder to the corresponding CLIP latent, which is beneficial for facilitating model convergence and improving the representative ability of the encoder. Masked latent alignment predicts the representations of masked patches within the feature space of CLIP latent as standard MIM task does, effectively aligning the representations computed from the encoder and the regressor into the same domain. We evaluate CAE v2 on various downstream tasks and demonstrate that our method achieves competitive performance on image classification, semantic segmentation, object detection and instance segmentation. Code will be available.

URL: https://openreview.net/forum?id=f36LaK7M0F

---

Title: Federated Minimax Optimization with Client Heterogeneity

Abstract: Minimax optimization has seen a surge in interest with the advent of modern applications such as GANs, and it is inherently more challenging than simple minimization. The difficulty is exacerbated by the training data residing at multiple edge devices or \textit{clients}, especially when these clients can have heterogeneous datasets and heterogeneous local computation capabilities. We propose a general federated minimax optimization framework that subsumes such settings and several existing methods like Local SGDA. We show that naive aggregation of model updates made by clients running unequal number of local steps can result in optimizing a mismatched objective function -- a phenomenon previously observed in standard federated minimization. To fix this problem, we propose normalizing the client updates by the number of local steps. We analyze the convergence of the proposed algorithm for classes of nonconvex-concave and nonconvex-nonconcave functions and characterize the impact of heterogeneous client data, partial client participation, and heterogeneous local computations. For all the function classes considered, we significantly improve the existing computation and communication complexity results. Experimental results support our theoretical claims.

URL: https://openreview.net/forum?id=NnUmg1chLL

---

Title: Quantifying neural network uncertainty under volatility clustering

Abstract: Time-series with complex structures pose a unique challenge to uncertainty quantification methods. Time-varying variance, such as volatility clustering as seen in financial time-series, can lead to large mismatch between predicted uncertainty and forecast error. In this work, we propose a novel framework to deal with uncertainty quantification under the presence of volatility clustering, building and extending the recent methodological advances in uncertainty quantification for non-time-series data. To illustrate the performance of our proposed approach, we apply it to two types of datasets: a collection of non-time-series data to show the general applicability of our framework and its ability to quantify the uncertainty better than the state-of-the art methods; and to two sets of financial time-series exhibiting volatility clustering: cryptocurrencies and U.S. equities.

URL: https://openreview.net/forum?id=UurPtVLuTC

---

Title: Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

Abstract: We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. The delay and compositeness of rewards mean that rewards generated as a result of taking an action at a given state are fragmented into different components, and they are sequentially realized at delayed time instances. The partial anonymity attribute implies that a learner, for each state, only observes the aggregate of past reward components generated as a result of different actions taken at that state, but realized at the observation instance. We propose an algorithm named $\mathrm{DUCRL2}$ to obtain a near-optimal policy for this setting and show that it achieves a regret bound of $\tilde{\mathcal{O}}\left(DS\sqrt{AT} + d (SA)^3\right)$ where $S$ and $A$ are the sizes of the state and action spaces, respectively, $D$ is the diameter of the MDP, $d$ is a parameter upper bounded by the maximum reward delay, and $T$ denotes the time horizon. This demonstrates the optimality of the bound in the order of $T$, and an additive impact of the delay.

URL: https://openreview.net/forum?id=ubCoTAynPp

---

Title: Distributionally Robust Classification on a Data Budget

Abstract: Real world uses of deep learning require predictable model behavior under distribution shifts. Models such as CLIP show emergent natural distributional robustness comparable to humans, but may require hundreds of millions of training samples. Can we train robust learners in a domain where data is limited? To rigorously address this question, we introduce JANuS (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions, and perform a series of careful controlled investigations of factors contributing to robustness in image classification. Using JANuS as a testbed, we show that standard ResNet-50 trained with the cross-entropy loss on 2.4 Mn image samples can attain comparable robustness to a CLIP ResNet-50 trained on 400 Mn samples. To our knowledge, this is the first result showing near state-of-the-art distributional robustness on a very limited data budget.

URL: https://openreview.net/forum?id=D5Z2E8CNsD

---

Title: Efficient Inference With Model Cascades

Abstract: State-of-the-art deep learning models are becoming ever larger. However, many practical applications are constrained by the cost of inference. Cascades of pretrained models with conditional execution address these requirements based on the intuition that some inputs are easy enough that they can be processed correctly by a smaller model allowing for an early exit. If the smaller model is not sufficiently confident in its prediction, the input is passed on to a larger model. The selection of the confidence threshold allows to trade off computational cost against accuracy. In this work we explore the effective design of model cascades, thoroughly evaluate the impact on the accuracy-efficiency trade-off, and provide a reproducible state-of-the-art baseline for related research that is currently missing. We demonstrate that model cascades dominate the ImageNet Pareto front already with 2-model cascades, achieving an average reduction in compute effort at equal accuracy of almost $3.1\times$ above 86% and more than $1.9\times$ between 80% and 86% top-1 accuracy, while 3-model cascades achieve $4.4\times$ above 87% accuracy. We confirm wider applicability and effectiveness of the method on the GLUE benchmark. We release the code to reproduce our experiments in the supplementary material and use only publicly available pretrained models and datasets.

URL: https://openreview.net/forum?id=obB415rg8q

---

Title: HypUC: Hyperfine Uncertainty Calibration with Gradient- boosted Corrections for Reliable Regression on Imbalanced Medical Time Series

Abstract: The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to be effective in processing such signals. However, previous research has largely focused on classifying medical time series rather than attempting to regress the continuous-valued physiological parameters that are central to diagnosis. One significant challenge in this regard is the imbalanced nature of the dataset, as a low prevalence of abnormal conditions can lead to heavily skewed data that results in inaccurate predictions and a lack of certainty in such predictions when deployed. Addressing this, we propose a new framework, HypUC, for imbalanced probabilistic regression in medical time series. Our approach incorporates ideas from probabilistic machine learning for uncertainty estimation. We also introduce a new calibration method that provides reliable uncertainty estimates that generalize well to a diverse range of test sets. Additionally, we present a method for using these calibrated uncertainties to improve decision-making through an ensemble of gradient-boosted learners. Furthermore, we demonstrate an entropy-based technique to flag unreliable predictions. We evaluate our approach on a large, real-world dataset of ECGs collected from millions of patients with various medical conditions. Our approach outperforms several baselines while also providing calibrated uncertainty estimates for many diagnostic problems indicating its suitability for clinical use and real-world deployment.

URL: https://openreview.net/forum?id=0Xo9giEZWf

---

Title: Differentially Private Diffusion Models

Abstract: While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge, providing access to synthetic data instead. We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs), which enforce privacy using differentially private stochastic gradient descent (DP-SGD). We investigate the DM parameterization and the sampling algorithm, which turn out to be crucial ingredients in DPDMs, and propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments. Moreover, on standard benchmarks, classifiers trained on DPDM-generated synthetic data perform on par with task-specific DP-SGD-trained classifiers, which has not been demonstrated before for DP generative models.

URL: https://openreview.net/forum?id=ZPpQk7FJXF

---

Title: EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models

Abstract: Electronic health records (EHR) contain a wealth of biomedical information, serving as valuable resources for the development of precision medicine systems. However, privacy concerns have resulted in limited access to high-quality and large-scale EHR data for researchers, impeding progress in methodological development. Recent research has delved into synthesizing realistic EHR data through generative modeling techniques, where a majority of proposed methods relied on generative adversarial networks (GAN) and their variants for EHR synthesis. Despite GAN-based methods attaining state-of-the-art performance in generating EHR data, these approaches are difficult to train and prone to mode collapse. Recently introduced in generative modeling, diffusion models have established cutting-edge performance in image generation, but their efficacy in EHR data synthesis remains largely unexplored. In this study, we investigate the potential of diffusion models for EHR data synthesis and introduce a novel method, EHRDiff. Through extensive experiments, EHRDiff establishes new state-of-the-art quality for synthetic EHR data, protecting private information in the meanwhile.

URL: https://openreview.net/forum?id=psRwI65PT8

---

Title: Self-supervised Learning for Segmentation and Quantification of Dopamine Neurons in Parkinson’s Disease

Abstract: Parkinson’s Disease (PD) is the second most common neurodegenerative disease in humans. PD
is characterized by the gradual loss of dopaminergic neurons in the Substantia Nigra (a part of the
mid-brain). Counting the number of dopaminergic neurons in the Substantia Nigra is one of the most
important indexes in evaluating drug efficacy in PD animal models. Currently, analyzing and quantifying
dopaminergic neurons is conducted manually by experts through analysis of digital pathology images
which is laborious, time-consuming, and highly subjective. As such, a reliable and unbiased automated
system is demanded for the quantification of dopaminergic neurons in digital pathology images. We
propose an end-to-end deep learning framework for the segmentation and quantification of dopaminergic
neurons in PD animal models. To the best of knowledge, this is the first machine learning model that
detects the cell body of dopaminergic neurons, counts the number of dopaminergic neurons and provides
the phenotypic characteristics of individual dopaminergic neurons as a numerical output. Extensive
experiments demonstrate the effectiveness of our model in quantifying neurons with a high precision,
which can provide quicker turnaround for drug efficacy studies, better understanding of dopaminergic
neuronal health status and unbiased results in PD pre-clinical research.

URL: https://openreview.net/forum?id=izFnURFG3f

---

Title: $k$-Mixup Regularization for Deep Learning via Optimal Transport

Abstract: Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To
better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to $k$-mixup, which perturbs $k$-batches of training points in the direction of other $k$-batches. The perturbation is done with displacement interpolation, i.e. interpolation under
the Wasserstein metric. We demonstrate theoretically and in simulations that $k$-mixup preserves cluster and manifold structures, and we extend theory studying the efficacy of standard mixup to the $k$-mixup case. Our empirical results show that training with $k$-mixup
further improves generalization and robustness across several network architectures and benchmark datasets of differing modalities. For the wide variety of real datasets considered, the performance gains of $k$-mixup over standard mixup are similar to or larger than the
gains of mixup itself over standard ERM after hyperparameter optimization. In several instances, in fact, $k$-mixup achieves gains in settings where standard mixup has negligible to zero improvement over ERM.

URL: https://openreview.net/forum?id=lOegPKSu04

---

Title: Projected Randomized Smoothing for Certified Adversarial Robustness

Abstract: Randomized smoothing is the current state-of-the-art method for producing provably robust classifiers. While randomized smoothing typically yields robust $\ell_2$-ball certificates, recent research has generalized provable robustness to different norm balls as well as anisotropic regions. This work considers a classifier architecture that first projects onto a low-dimensional approximation of the data manifold and then applies a standard classifier. By performing randomized smoothing in the low-dimensional projected space, we characterize the certified region of our smoothed composite classifier back in the high-dimensional input space and prove a tractable lower bound on its volume. We show experimentally on CIFAR-10 and SVHN that classifiers without the initial projection are vulnerable to perturbations that are normal to the data manifold and yet are captured by the certified regions of our method. We compare the volume of our certified regions against various baselines and show that our method improves on the state-of-the-art by many orders of magnitude.

URL: https://openreview.net/forum?id=FObkvLwNSo

---

Reply all
Reply to author
Forward
0 new messages