Weekly TMLR digest for Oct 08, 2023

6 views
Skip to first unread message

TMLR

unread,
Oct 7, 2023, 8:00:11 PM10/7/23
to tmlr-annou...@googlegroups.com


New certifications
==================

Featured Certification: Improved baselines for vision-language pre-training

Enrico Fini, Pietro Astolfi, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal

https://openreview.net/forum?id=a7nvXxNmdV

---


Accepted papers
===============


Title: Projected Randomized Smoothing for Certified Adversarial Robustness

Authors: Samuel Pfrommer, Brendon G. Anderson, Somayeh Sojoudi

Abstract: Randomized smoothing is the current state-of-the-art method for producing provably robust classifiers. While randomized smoothing typically yields robust $\ell_2$-ball certificates, recent research has generalized provable robustness to different norm balls as well as anisotropic regions. This work considers a classifier architecture that first projects onto a low-dimensional approximation of the data manifold and then applies a standard classifier. By performing randomized smoothing in the low-dimensional projected space, we characterize the certified region of our smoothed composite classifier back in the high-dimensional input space and prove a tractable lower bound on its volume. We show experimentally on CIFAR-10 and SVHN that classifiers without the initial projection are vulnerable to perturbations that are normal to the data manifold and yet are captured by the certified regions of our method. We compare the volume of our certified regions against various baselines and show that our method improves on the state-of-the-art by many orders of magnitude.

URL: https://openreview.net/forum?id=FObkvLwNSo

---

Title: Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations

Authors: Xinyu Yang, Huaxiu Yao, Allan Zhou, Chelsea Finn

Abstract: There is an inescapable long-tailed class-imbalance issue in many real-world classification problems. Current methods for addressing this problem only consider scenarios where all examples come from the same distribution. However, in many cases, there are multiple domains with distinct class imbalance. We study this multi-domain long-tailed learning problem and aim to produce a model that generalizes well across all classes and domains. Towards that goal, we introduce TALLY, a method that addresses this multi-domain long-tailed learning problem. Built upon a proposed selective balanced sampling strategy, TALLY achieves this by mixing the semantic representation of one example with the domain-associated nuisances of another, producing a new representation for use as data augmentation. To improve the disentanglement of semantic representations, TALLY further utilizes a domain-invariant class prototype that averages out domain-specific effects. We evaluate TALLY on several benchmarks and real-world datasets and find that it consistently outperforms other state-of-the-art methods in both subpopulation and domain shift.

URL: https://openreview.net/forum?id=4UXJhNSbwd

---

Title: Improved baselines for vision-language pre-training

Authors: Enrico Fini, Pietro Astolfi, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal

Abstract: Contrastive learning has emerged as an efficient framework to learn multimodal representations. CLIP, a seminal work in this area, achieved impressive results by training on paired image-text data using the contrastive loss. Recent work claims improvements over CLIP using additional non-contrastive losses inspired from self-supervised learning.
However, it is sometimes hard to disentangle the contribution of these additional losses from other implementation details, \eg, data augmentation or regularization techniques, used to train the model. To shed light on this matter, in this paper, we first propose, implement and evaluate several baselines obtained by combining contrastive learning with recent advances in self-supervised learning.
In particular, we use the loss functions that were proven successful for visual self-supervised learning to align image and text modalities. We find that these baselines outperform a basic implementation of CLIP. However, when a stronger training recipe is employed, the advantage disappears. Indeed, we find that a simple CLIP baseline can also be improved substantially, up to a 25\% relative improvement on downstream zero-shot tasks, by using well-known training techniques that are popular in other subfields. Moreover, we discover that it is enough to apply image and text augmentations to make up for most of the improvement attained by prior works. With our improved training recipe for CLIP, we obtain state-of-the-art performance on four standard datasets, and consistently outperform prior work (up to +4\% on the largest dataset), while being substantially simpler.

URL: https://openreview.net/forum?id=a7nvXxNmdV

---

Title: CAE v2: Context Autoencoder with CLIP Latent Alignment

Authors: Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Abstract: Masked image modeling (MIM) learns visual representations by predicting the masked patches on a pre-defined target. Inspired by MVP(Wei et al., 2022b) that displays impressive gains with CLIP, in this work, we also employ the semantically rich CLIP latent as target and further tap its potential by introducing a new MIM pipeline, CAE v2, to learn a high-quality encoder and facilitate model convergence on the pre-training task. CAE v2 is an improved variant of CAE (Chen et al., 2023), applying the CLIP latent on two pretraining tasks, i.e., visible latent alignment and masked latent alignment. Visible latent alignment directly mimics the visible latent representations from the encoder to the corresponding CLIP latent, which is beneficial for facilitating model convergence and improving the representative ability of the encoder. Masked latent alignment predicts the representations of masked patches within the feature space of CLIP latent as standard MIM task does, effectively aligning the representations computed from the encoder and the regressor into the same domain. We pretrain CAE v2 on ImageNet-1K images and evaluate on various downstream vision tasks, including image classification, semantic segmentation, object detection and instance segmentation. Experiments show that our CAE v2 achieves competitive performance and even outperforms the CLIP vision encoder, demonstrating the effectiveness of our method. Code is available at https://github.com/Atten4Vis/CAE.

URL: https://openreview.net/forum?id=f36LaK7M0F

---

Title: Cross-validation for Geospatial Data: Estimating Generalization Performance in Geostatistical Problems

Authors: Jing Wang, Laurel Hopkins, Tyler Hallman, W. Douglas Robinson, Rebecca Hutchinson

Abstract: Geostatistical learning problems are frequently characterized by spatial autocorrelation in the input features and/or the potential for covariate shift at test time. These realities violate the classical assumption of independent, identically distributed data, upon which most cross-validation algorithms rely in order to estimate the generalization performance of a model. In this paper, we present a theoretical criterion for unbiased cross-validation estimators in the geospatial setting. We also introduce a new cross-validation algorithm to evaluate models, inspired by the challenges of geospatial problems. We apply a framework for categorizing problems into different types of geospatial scenarios to help practitioners select an appropriate cross-validation strategy. Our empirical analyses compare cross-validation algorithms on both simulated and several real datasets to develop recommendations for a variety of geospatial settings. This paper aims to draw attention to some challenges that arise in model evaluation for geospatial problems and to provide guidance for users.

URL: https://openreview.net/forum?id=VgJhYu7FmQ

---

Title: Adaptive Hyperparameter Selection for Differentially Private Gradient Descent

Authors: Dominik Fay, Sindri Magnússon, Jens Sjölund, Mikael Johansson

Abstract: We present an adaptive mechanism for hyperparameter selection in differentially private optimization that addresses the inherent trade-off between utility and privacy. The mechanism eliminates the often unstructured and time-consuming manual effort of selecting hyperparameters and avoids the additional privacy costs that hyperparameter selection otherwise incurs on top of that of the actual algorithm.

We instantiate our mechanism for noisy gradient descent on non-convex, convex and strongly convex loss functions, respectively, to derive schedules for the noise variance and step size. These schedules account for the properties of the loss function and adapt to convergence metrics such as the gradient norm. When using these schedules, we show that noisy gradient descent converges at essentially the same rate as its noise-free counterpart. Numerical experiments show that the schedules consistently perform well across a range of datasets without manual tuning.

URL: https://openreview.net/forum?id=LLKI5Lq2YN

---

Title: Multiscale Causal Structure Learning

Authors: Gabriele D'Acunto, Paolo Di Lorenzo, Sergio Barbarossa

Abstract: Causal structure learning methods are vital for unveiling causal relationships embedded into observed data. However, the state of the art suffers a major limitation: it assumes that causal interactions occur only at the frequency at which data is observed. To address this limitation, this paper proposes a method that allows structural learning of linear causal relationships occurring at different time scales. Specifically, we explicitly take into account instantaneous and lagged inter-relations between multiple time series, represented at different scales, hinging on wavelet transform. We cast the problem as the learning of a multiscale causal graph having sparse structure and dagness constraints, enforcing causality through directed and acyclic topology. To solve the resulting (non-convex) formulation, we propose an algorithm termed MS-CASTLE, which exhibits consistent performance across different noise distributions and wavelet choices. We also propose a single-scale version of our algorithm, SS-CASTLE, which outperforms existing methods in computational efficiency, performance, and robustness on synthetic data. Finally, we apply the proposed approach to learn the multiscale causal structure of the risk of 15 global equity markets, during covid-19 pandemic, illustrating the importance of multiscale analysis to reveal useful interactions at different time resolutions. Financial investors can leverage our approach to manage risk within equity portfolios from a causal perspective, tailored to their investment horizon.

URL: https://openreview.net/forum?id=Ub6XILEF9x

---


New submissions
===============


Title: Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Abstract: Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial.

URL: https://openreview.net/forum?id=NmpjDHWIvg

---

Title: Blockwise Self-Supervised Learning at Scale

Abstract: Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins' loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48%, only 1.1% below the accuracy of an end-to-end pretrained network (71.57% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience.

URL: https://openreview.net/forum?id=M2m618iIPk

---

Title: A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions

Abstract: We propose an algorithm for optimizing the parameters of single hidden layer neural networks.
Specifically, we derive a blockwise difference-of-convex (DC) functions representation
of the objective function. Based on the latter, we propose a block coordinate descent (BCD)
approach that we combine with a tailored difference-of-convex functions algorithm (DCA).
We prove global convergence of the proposed algorithm. Furthermore, we mathematically
analyze the convergence rate of parameters and the convergence rate in value (i.e., the training
loss). We give conditions under which our algorithm converges linearly or even faster
depending on the local shape of the loss function. We confirm our theoretical derivations
numerically and compare our algorithm against state-of-the-art gradient-based solvers in
terms of both training loss and test loss.

URL: https://openreview.net/forum?id=EDqCY6ihbr

---

Title: Towards an Understanding of Decision-Time vs. Background Planning in Model-Based Reinforcement Learning

Abstract: In model-based reinforcement learning, an agent can leverage a learned model to improve its way of behaving in different ways. Two of the prevalent approaches are decision-time planning and background planning. In this study, we are interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other. After viewing them in a unified way through the lens of dynamic programming, we first consider the simplest instantiations of these planning styles and provide theoretical results and hypotheses on which one will perform better in the planning & learning and transfer learning settings. We then consider the modern instantiations of them and provide hypotheses on which one will perform better in the considered settings. Lastly, we perform several experiments to illustrate and validate both our theoretical results and hypotheses. Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in its simplest instantiations, the modern instantiations of it can perform on par or better than the modern instantiations of background planning in both the planning & learning and transfer learning settings.

URL: https://openreview.net/forum?id=LsVePmcBdt

---

Title: Deep Goal-Oriented Clustering

Abstract: Clustering and prediction are two primary tasks in the fields of unsupervised and supervised machine learning. Although much of the recent advances in machine learning have been centered around those two tasks, the interdependent, mutually beneficial relationship between them is rarely explored. In this work, we hypothesize that a better prediction performance for the downstream task would inform a more appropriate clustering strategy. To this end, we introduce Deep Goal-Oriented Clustering (DGC), a probabilistic framework built upon a variational autoencoder with the latent prior being a Gaussian mixture distribution. DGC clusters the data by jointly predicting the side-information and modeling the inherent data structure in an end-to-end fashion. We show the effectiveness of our model on a range of datasets by achieving good prediction accuracies on the side-information, while, more importantly in our setting, simultaneously learning congruent clustering strategies that are on par with the state-of-the-art. We also apply DGC to a real-world breast cancer dataset and show that the discovered clusters carry clinical significance.

URL: https://openreview.net/forum?id=BEWW3ZvYxa

---

Title: Improving Robustness and Diversity with Adversarial Contrastive Network Ensembles

Abstract: Relying on ensemble diversity strategies to improve adversarial robustness has been investigated in several papers, but the gains provided by ensemble-based defenses remain limited so far. In this work, we propose Adversarial Contrastive Network (ACN) ensembles as a defense against white-box adversarial attacks which is based on a new ensemble diversity strategy. It consists in projecting the output feature maps of the different ensemble models in a shared latent space with a projection network and using contrastive learning to diversify the feature representations learned by the different models. The performance of the proposed method is evaluated and compared to regular ensembles in terms of adversarial robustness and ensemble diversity. Results obtained demonstrate superior adversarial robustness for ACN ensembles against the Fast Gradient Sign Method attack and against Projected Gradient Descent attacks using low distortion bounds. Lower transferability of adversarial examples among individual models within ACN ensembles is also demonstrated, suggesting that the proposed method helps achieve more diverse representations.

URL: https://openreview.net/forum?id=5L7V19RhXB

---

Title: Accelerated Deep Active Learning with Graph-based Sub-Sampling

Abstract: Past years have witnessed the fast and thorough development of active learning, a human-in-the-loop semi-supervised learning that helps reduce the burden of expensive data annotation. Diverse techniques have been proposed to improve the efficiency of label acquisition. However, the existing techniques are mostly intractable at scale on massive unlabeled instances. In particular, the query time and model retraining time of large scale image-data models is usually linear or even quadratic in the size of the unlabeled pool set and its dimension. The main reason for this intractability is the iterative need to scan the pool set at least once in order to select the best samples for label annotation.

To alleviate this computational burden we propose efficient Diffusion Graph Active Learning (DGAL). DGAL is used on a pre-computed Variational-Auto-Encoders (VAE) latent space to restrict the pool set to a much smaller candidates set. The sub-sample is then used in deep architectures, to reduce the query time, via an additional standard active learning baseline criterion.
DGAL demonstrates a query time versus accuracy trade-off that is two or more orders of magnitude acceleration over state-of-the-art methods. Moreover, we demonstrate the important exploration-exploitation trade-off in DGAL that allows the restricted set to capture the most impactful samples for active learning at each iteration.

URL: https://openreview.net/forum?id=ENHSYYas3e

---

Title: TensorVAE: a simple and efficient generative model for molecular conformation generation

Abstract: Efficient generation of 3D conformations of a molecule from its 2D graph is a key challenge in in-silico drug discovery. Deep learning (DL) based generative modelling has recently become a potent tool to tackling this challenge. However, many existing DL-based methods are either indirect–leveraging inter-atomic distances or direct–but requiring numerous sampling steps to generate conformations. In this work, we propose a simple model abbreviated TensorVAE capable of generating conformations directly from a 2D molecular graph in a single step. The main novelty of the proposed method is focused on feature engineering. We develop a novel encoding and feature extraction mechanism relying solely on standard convolution operation to generate token-like feature vector for each atom. These feature vectors are then transformed through standard transformer encoders under a conditional Variational Autoencoder framework for generating conformations directly. We show through experiments on two benchmark datasets that with intuitive feature engineering, a relatively simple and standard model can provide promising generative capability outperforming more than a dozen state-of-the-art models employing more sophisticated and specialized generative architecture.

URL: https://openreview.net/forum?id=rQqzt4gYcc

---

Title: The Last Mile to Supervised Performance: Semi-Supervised Domain Adaptation for Semantic Segmentation

Abstract: Supervised deep learning requires massive labeled datasets, but obtaining annotations is not always easy or possible, especially for dense tasks like semantic segmentation. To overcome this issue, numerous works explore Unsupervised Domain Adaptation (UDA), which uses a labeled dataset from another domain (source), or Semi-Supervised Learning (SSL), which trains on a partially labeled set. Despite the success of UDA and SSL, reaching supervised performance at a low annotation cost remains a notoriously elusive goal. To address this, we study the promising setting of Semi-Supervised Domain Adaptation (SSDA). We propose a simple SSDA framework that combines consistency regularization, pixel contrastive learning, and self-training to effectively utilize a few target-domain labels. Our method outperforms prior art in the popular GTA$\rightarrow$Cityscapes benchmark and shows that as little as $50$ target labels can suffice to achieve near-supervised performance. Additional results on Synthia$\rightarrow$Cityscapes, GTA$\rightarrow$BDD and Synthia$\rightarrow$BDD further demonstrate the effectiveness and practical utility of the method. Lastly, we find that existing UDA and SSL methods are not well-suited for the SSDA setting and discuss design patterns to adapt them.

URL: https://openreview.net/forum?id=419MRJ2U0D

---

Title: Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces

Abstract: Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.

URL: https://openreview.net/forum?id=emXh4M7TyH

---

Title: Extending Path-Dependent NJ-ODEs to Noisy Observations and a Dependent Observation Framework

Abstract: The Path-Dependent Neural Jump ODE (PD-NJ-ODE) \citep{krach2022optimal} is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independent and observations were assumed to be noiseless. In this work we discuss two extensions to lift these restrictions and provide theoretical guarantees as well as empirical examples for them.

URL: https://openreview.net/forum?id=0T2OTVCCC1

---

Title: I-ASIDE: Towards the Global Interpretability of Image Model Robustness through the Lens of Axiomatic Spectral Importance Decomposition

Abstract: Robust decisions leverage a high proportion of robust features. Natural images have spectral anisotropy and the majority of spectral energy concentrates on low-frequency components. A change with an infinitesimal amount of energy on the high-frequency components can rewrite the features dominated by high-frequency components. Image models are parameterized general non-linear signal filters. The fragility of the learned feature representations of image models correlates with spectral structures. The spectral importance decomposition of the statistical expectations of the negative decision risks of models with respect to spectrum can thus reflect model robustness from the perspective of feature robustness. To this end, we formulate the spectral importance decomposition problem, and, present Image Axiomatic Spectral Importance Decomposition Explanation (I-ASIDE) -- a model-agnostic global interpretability method -- to quantify model global robustness from the perspective of the susceptibility of feature representations to perturbations. Our approach provides a unique insight into interpreting model global robustness and enables a considerable number of applications in research, from measuring model robustness, to studying learning dynamics, to assessing label noise, to investigating adversarial vulnerability, etc. We also showcase multiple applications across multiple research domains to endorse such claims.

URL: https://openreview.net/forum?id=D2qMFfYlYb

---

Title: DyG2Vec: Representation Learning for Dynamic Graphs with Self-supervision

Abstract: Temporal graph neural networks have shown promising results in learning inductive representations by automatically extracting temporal patterns. However, previous works often rely on complex memory modules or inefficient random walk methods to construct temporal representations. In addition, the existing dynamic graph encoders are non-trivial to adapt to self-supervised paradigms, which prevents them from utilizing unlabeled data. To address these limitations, we present an efficient yet effective attention-based encoder that leverages temporal edge encodings and window-based subgraph sampling to generate task-agnostic embeddings. Moreover, we propose a joint-embedding architecture using non-contrastive SSL to learn rich temporal embeddings without labels. Experimental results on 7 benchmark datasets indicate that on average, our model outperforms SoTA baselines on the future link prediction task by 4.23% for the transductive setting and 3.30% for the inductive setting while only requiring 5-10x less training/inference time. Additionally, we empirically validate the SSL pre-training significance under two probings commonly used in language and vision modalities. Lastly, different aspects of the proposed framework are investigated through experimental analysis and ablation studies.

URL: https://openreview.net/forum?id=YRKS2J0x36

---

Title: Towards Optimization-Friendly Binary Neural Network

Abstract: Binary neural networks (BNNs) are a promising approach for compressing and accelerating deep learning models, especially in resource-constrained environments. However, the optimization gap between BNNs and their full-precision counterparts has long been an open problem limiting their performance. In this work, we propose a novel optimization pipeline to enhance the performance of BNNs. The main approach includes three key components: (1) BNext, a strong binary baseline based on an optimization-friendly basic block design, (2) knowledge complexity, a simple yet effective teacher-selection metric taking the capacity gap between teachers and binary students under consideration, (3) consecutive knowledge distillation (CKD), a novel multi-round optimization technique to transfer high-confidence knowledge from strong teachers to low-capacity BNNs.
We empirically validate the superiority of the method on several vision classification tasks CIFAR-10/100 & ImageNet. For instance, the BNext family outperforms previous BNNs under different capacity levels and contributes the first binary neural network to reach the state-of-the-art 80.57\% Top-1 accuracy on ImageNet with 0.82 GOPS, which verifies the potential of BNNs and already contributes a strong baseline for future research on high-accuracy BNNs. The code will be publicly available at (blind URL, see supplementary material).

URL: https://openreview.net/forum?id=4Hq816XDDG

---

Title: Neural Task Synthesis for Visual Programming

Abstract: Generative neural models hold great promise in enhancing programming education by synthesizing new content. We seek to design neural models that can automatically generate programming tasks for a given specification in the context of visual programming domains. Despite the recent successes of large generative models like GPT-4, our initial results show that these models are ineffective in synthesizing visual programming tasks and struggle with logical and spatial reasoning. We propose a novel neuro-symbolic technique, NeurTaskSyn, that can synthesize programming tasks for a specification given in the form of desired programming concepts exercised by its solution code and constraints on the visual task. NeurTaskSyn has two components: the first component is trained via imitation learning procedure to generate possible solution codes, and the second component is trained via reinforcement learning procedure to guide an underlying symbolic execution engine that generates visual tasks for these codes. We demonstrate the effectiveness of NeurTaskSyn through an extensive empirical evaluation and a qualitative study on reference tasks taken from the Hour of Code: Classic Maze challenge by Code.org and the Intro to Programming with Karel course by CodeHS.com.

URL: https://openreview.net/forum?id=aYkYajcJDN

---

Title: Effective Latent Differential Equation Models via Attention and Multiple Shooting

Abstract: Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnostic machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. GOKU-UI not only broadens the original model's spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), but also integrates attention mechanisms and a novel multiple shooting training strategy in the latent space. These modifications have led to a significant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Specifically, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 16-fold smaller, underscoring its remarkable data efficiency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau oscillators into its dynamical core, our proposed enhancements markedly increased the model's effectiveness in capturing complex brain dynamics. This augmented version not only surpassed all baseline methods in the reconstruction task, but also demonstrated lower prediction error of future brain activity up to 15 seconds ahead. By training GOKU-UI on resting state fMRI data, we encoded whole-brain dynamics into a latent representation, learning a low-dimensional dynamical system model that could offer insights into brain functionality and open avenues for practical applications such as the classification of mental states or psychiatric conditions. Ultimately, our research provides further impetus for the field of Scientific Machine Learning, showcasing the potential for advancements when established scientific insights are interwoven with modern machine learning.

URL: https://openreview.net/forum?id=uxNfN2PU1W

---

Title: Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition

Abstract: Deep neural networks (DNNs) have revolutionized video action recognition, but their increasing use in critical applications also makes them attractive targets for attacks. In particular, backdoor attacks have emerged as a potent threat, enabling attackers to manipulate a DNN's output by injecting a trigger, without affecting the model's performance on clean data. While the effectiveness of backdoor attacks on image recognition is well-known, their impact on video action recognition is not yet fully understood. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. Contrary to prior works that studied clean label backdoor attacks against video action recognition and found them ineffective, our paper investigates the efficacy of poisoned label backdoor attacks against video action recognition and demonstrates their effectiveness. We show that existing poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically. Furthermore, we explore real-world video backdoors to highlight the seriousness of this vulnerability. Finally, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate. Our results highlight the urgent need for developing robust defenses against backdoor attacks on DNNs for video action recognition.

URL: https://openreview.net/forum?id=YBSONcjwCa

---

Title: Bias Amplification Enhances Minority Group Performance

Abstract: Neural networks produced by standard training are known to suffer from poor accuracy on rare subgroups despite achieving high accuracy on average, due to the correlations between certain spurious features and labels. Previous approaches based on worst-group loss minimization (e.g. Group-DRO) are effective in improving worse-group accuracy but require expensive group annotations for all the training samples. In this paper, we focus on the more challenging and realistic setting where group annotations are only available on a small validation set or are not available at all. We propose BAM, a novel two-stage training algorithm: in the first stage, the model is trained using a bias amplification scheme via introducing a learnable auxiliary variable for each training sample; in the second stage, we upweight the samples that the bias-amplified model misclassifies, and then continue training the same model on the reweighted dataset. Empirically, BAM achieves competitive performance compared with existing methods evaluated on spurious correlation benchmarks in computer vision and natural language processing. Moreover, we find a simple stopping criterion based on minimum class accuracy difference that can remove the need for group annotations, with little or no loss in worst-group accuracy. We perform extensive analyses and ablations to verify the effectiveness and robustness of our algorithm in varying class and group imbalance ratios.

URL: https://openreview.net/forum?id=75OwvzZZBT

---

Reply all
Reply to author
Forward
0 new messages