Daily TMLR digest for Jun 08, 2024

0 views

Skip to first unread message

TMLR

unread,

Jun 8, 2024, 12:00:09 AMJun 8

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metric

Authors: Alexander D. J. Taylor, Phillip Tregidgo, Jonathan James Morrison, Neill D. F. Campbell

Abstract: We release VisionAD, an anomaly detection library in the domain of images. The library forms the largest and most performant collection of such algorithms to date. Each algorithm is written through a standardised API, for ease of use. The library has a focus on fair benchmarking intended to mitigate the issue of cherry-picked results. It enables rapid experimentation and straightforward integration of new algorithms. In addition, we propose a new metric, Proportion Localised (PL). This reports the proportion of anomalies that are sufficiently localised via classifying each discrete anomaly as localised or not. The metric is far more intuitive as it has a real physical relation, meaning it is attractive to industry-based professionals. We also release the VisionADIndustrial (VADI) benchmark, a thorough benchmarking of the top anomaly detection algorithms. This benchmark calculates the mean across the pooled classes of the MVTec and VisA datasets. We are committed to hosting an updated version of this leaderboard online, and encourage researchers to add, tweak and improve algorithms to climb this leaderboard. VisionAD code is found at https://github.com/alext1995/VisionAD, and Proportion Localised code is found at https://github.com/alext1995/proportion_localised.

URL: https://openreview.net/forum?id=o5kYH7bNe3

---

Title: Holistic Molecular Representation Learning via Multi-view Fragmentation

Authors: Seojin Kim, Jaehyun Nam, Junsu Kim, Hankook Lee, Sungsoo Ahn, Jinwoo Shin

Abstract: Learning chemically meaningful representations from unlabeled molecules plays a vital role in AI-based drug design and discovery. In response to this, several self-supervised learning methods have been developed, focusing either on global (e.g., graph-level) or local (e.g., motif-level) information of molecular graphs. However, it is still unclear which approach is more e ffective for learning better molecular representations. In this paper, we propose a novel holistic self-supervised molecular representation learning framework that e ffectively learns both global and local molecular information. Our key idea is to utilize fragmentation, which decomposes a molecule into a set of chemically meaningful fragments (e.g., functional groups), to associate a global graph structure to a set of local substructures, thereby preserving chemical properties and learn both information via contrastive learning between them. Additionally, we also consider the 3D geometry of molecules as another view for contrastive learning. We demonstrate that our framework outperforms prior molecular representation learning methods across various molecular property prediction tasks.

URL: https://openreview.net/forum?id=ufDh55J1ML

---

New submissions
===============

Title: ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Abstract: The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. In this work, we address a critical challenge in the field of ECG analysis with deep learning: learning robust representation without large-scale labeled datasets. We propose ECG Semantic Integrator (ESI), a novel multimodal contrastive pretraining framework that jointly learns from ECG signals and associated textual descriptions. ESI employs a dual objective function that comprises a contrastive loss and a captioning loss to develop representations of ECG data. To create a sufficiently large and diverse training dataset, we develop a retrieval-augmented generation (RAG)-based Large Language Model (LLM) pipeline, called Cardio Query Assistant (CQA). This pipeline is designed to generate detailed textual descriptions for ECGs from diverse databases. The generated text includes information about demographics and waveform patterns. This approach enables us to compile a large-scale multimodal dataset with over 660,000 ECG-text pairs for pretraining ESI, which then learns robust and generalizable representations of 12-lead ECG. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experimental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches. Our work shows the potential of combining multimodal pretraining to improve the analysis of ECG signals.

URL: https://openreview.net/forum?id=giEbq8Khcf

---

Title: FLR: Label-Mixture Regularization for Federated Learning with Noisy Labels

Abstract: Label noise in federated learning (FL) has garnered increasing attention due to the decentralized nature of FL, where data is collected from multiple clients with potentially different levels of label noise. This study introduces two pivotal contributions to this domain. First, we anatomize the memorization phenomenon in FL into server-side and client-side components, marking the first investigation into how these distinct forms of memorization impact learning. Second, to mitigate the memorization in FL, we present the Federated Label-mixture Regularization (FLR) strategy, a straightforward yet effective approach that employs regularization through pseudo labels generated by merging local and global model predictions. This method not only improves the accuracy of the global model in both i.i.d. and non-i.i.d. settings but also effectively counters the memorization of noisy labels. We empirically find that FLR aligns with and advances existing FL and noisy label mitigation methods over multiple datasets under various levels of data heterogeneity and label noise.

URL: https://openreview.net/forum?id=Z8A3HDgS0E

---

Title: On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization

Abstract: We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL). In order to solve such an optimization problem, we apply and analyze the classical stochastic proximal gradient method. In particular, the method has shown to admit an $O(\epsilon^{-4})$ sample complexity to an $\epsilon$-stationary point, under standard conditions. Since the variance of the classical stochastic gradient estimator is typically large, which slows down the convergence, we also apply an efficient stochastic variance-reduce proximal gradient method with an importance sampling based ProbAbilistic Gradient Estimator (PAGE). Our analysis shows that the sample complexity can be improved from $O(\epsilon^{-4})$ to $O(\epsilon^{-3})$ under additional conditions. Our results on the stochastic (variance-reduced) proximal gradient method match the sample complexity of their most competitive counterparts for discounted Markov decision processes under similar settings. To the best of our knowledge, the proposed methods represent a novel approach in addressing the general regularized reward optimization problem.

URL: https://openreview.net/forum?id=Ve4Puj2LVT

---

Title: Attention Normalization Impacts Cardinality Generalization in Slot Attention

Abstract: Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image segmentation and object tracking in videos, is a deep learning component which performs unsupervised object-centric scene decomposition on input images. It is based on an attention architecture, in which latent slot vectors, which hold compressed information on objects, attend to localized perceptual features from the input image. In this paper, we show that design decisions on normalizing the aggregated values in the attention architecture have considerable impact on the capabilities of Slot Attention to generalize to a higher number of slots and objects as seen during training. We argue that the original Slot Attention normalization scheme discards information on the objects' sizes, which impairs its generalization capabilities. Based on these findings, we propose and investigate alternative normalization approaches which increase the generalization capabilities of Slot Attention to varying slot and object counts, resulting in performance gains on the tasks of unsupervised image segmentation.

URL: https://openreview.net/forum?id=llQXLfbGOq

---

Title: InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Abstract: Recent works have explored text-guided image editing using diffusion models and generated edited images based on text prompts. However, the models struggle to accurately locate the regions to be edited and faithfully perform precise edits. In this work, we propose a framework termed InstructEdit that can do fine-grained editing based on user instructions. Our proposed framework has three components: language processor, segmenter, and image editor. The first component, the language processor, processes the user instruction using a large language model. The goal of this processing is to parse the user instruction and output prompts for the segmenter and captions for the image editor. We adopt ChatGPT and optionally BLIP2 for this step. The second component, the segmenter, uses the segmentation prompt provided by the language processor. We employ a state-of-the-art segmentation framework Grounded Segment Anything to automatically generate a high-quality mask based on the segmentation prompt. The third component, the image editor, uses the captions from the language processor and the masks from the segmenter to compute the edited image. We adopt Stable Diffusion and the mask-guided generation from DiffEdit for this purpose. Experiments show that our method outperforms previous editing methods in fine-grained editing applications where the input image contains a complex object or multiple objects. We improve the mask quality over DiffEdit and thus improve the quality of edited images. We also show that our framework can be combined with the NeRF or video editing pipeline to achieve fine-grained scale NeRF or video editing application.

URL: https://openreview.net/forum?id=O25Tahy6Ax

---

Title: PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off

Abstract: Efficient inference of Deep Neural Networks (DNNs) on resource-constrained edge devices is essential. Quantization and sparsity are key techniques that translate to repetition and sparsity within tensors at the hardware-software interface. This paper introduces the concept of repetition-sparsity trade-off that helps explain computational efficiency during inference. We propose PLUM, a unified co-design framework that integrates DNN inference systems and quantization (forward and backward pass) to leverage the repetition-sparsity trade-off to improve inference efficiency. Our results demonstrate that PLUM’s quantization method is more accurate than binary quantization with the same number of non-zero weights. Detailed analysis indicates that signed binarization generates a smaller distribution of effectual (non-zero) parameters nested within a larger distribution of total parameters of latent full-precision weights for a DNN block. Finally, the proposed PLUM framework achieves a 26% speedup on real hardware, doubles energy efficiency, and reduces
density by 2.8× compared to binary methods while retaining top-1 accuracy when compared to prior-art methods for ResNets on ImageNet (by achieving 66.2% top-1 accuracy), presenting an alternative solution for deploying efficient models in resource-limited environments

URL: https://openreview.net/forum?id=IEKtMMSblm

---

Title: Sample-efficient decoding of visual stimuli from fMRI through inter-individual functional alignment

Abstract: Deep learning is leading to major advances in the realm of brain decoding from functional Magnetic Resonance Imaging (fMRI). However, the large inter-individual variability in brain characteristics has constrained most studies to train models on one participant at a time. This limitation hampers the training of deep learning models, which typically requires very large datasets. Here, we propose to boost brain decoding of videos and static images across participants by aligning brain responses of training and left-out participants. Evaluated on a retrieval task, compared to the anatomically-aligned baseline, our method halves the median rank in out-of-subject setups. It also outperforms classical within-subject approaches when fewer than 100 minutes of data is available for the tested participant. Furthermore, we show that our alignment framework handles multiple subjects, which improves accuracy upon classical single-subject approaches. Finally, we show that this method aligns neural representations in accordance with brain anatomy. Overall, this study lays the foundations for leveraging extensive neuroimaging datasets and enhancing the decoding of individual brains when a limited amount of brain-imaging data is available.

URL: https://openreview.net/forum?id=qvJraN50DT

---

Title: Publicly-Detectable Watermarking for Language Models

Abstract: We present a highly detectable, trustless watermarking scheme for LLMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LLM output using rejection sampling. We prove that our scheme is cryptographically correct, sound, and distortion-free. We make novel uses of error-correction techniques to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and make empirical measurements over open models in the 2.7B to 70B parameter range. Our experiments suggest that our formal claims are met in practice.

URL: https://openreview.net/forum?id=KUcPucDTSl

---

Title: Selective Classification Under Distribution Shifts

Abstract: In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers---imperfect either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond---in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research in SC, most previous SC methods still focus on the ideal statistical setting only, i.e., the data distribution at deployment is the same as that of training, although practical data can come from the wild. To bridge this gap, in this paper, we propose an SC framework that takes into account distribution shifts, termed generalized selective classification, that covers label-shifted (or out-of-distribution) and covariate-shifted samples, in addition to typical in-distribution samples, the first of its kind in the SC literature. We focus on non-training-based confidence-score functions for generalized SC on deep learning (DL) classifiers and propose two novel margin-based score functions. Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers.

URL: https://openreview.net/forum?id=dmxMGW6J7N

---

Title: Incremental Spatial and Spectral Learning of Neural Operators for Solving Large-Scale PDEs

Abstract: Fourier Neural Operators (FNO) offer a principled approach to solving challenging partial differential equations (PDE) such as turbulent flows. At the core of FNO is a spectral layer that leverages a discretization-convergent representation in the Fourier domain, and learns
weights over a fixed set of frequencies. However, training FNO presents two significant challenges, particularly in large-scale, high-resolution applications: (i) Computing Fourier transform on high-resolution inputs is computationally intensive but necessary since fine-scale details are needed for solving many PDEs, such as fluid flows, (ii) selecting the relevant set of frequencies in the spectral layers is challenging, and too many modes can lead to overfitting, while too few can lead to underfitting. To address these issues, we introduce
the Incremental Fourier Neural Operator (iFNO), which progressively increases both the number of frequency modes used by the model as well as the resolution of the training data. We empirically show that iFNO reduces total training time while maintaining or improving
generalization performance across various datasets. Our method demonstrates a 38% lower testing error, using 20% fewer frequency modes compared to the existing FNO, while also achieving up to 46% faster training and a 2.8x reduction in model size.

URL: https://openreview.net/forum?id=xI6cPQObp0

---

Title: FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

Abstract: Controllable text-to-image (T2I) diffusion models generate images conditioned on both text
prompts and semantic inputs of other modalities like edge maps. Nevertheless, current
controllable T2I methods commonly face challenges related to efficiency and faithfulness,
especially when conditioning on multiple inputs from either the same or diverse modalities. In
this paper, we propose a novel Flexible and Efficient method, FlexEControl, for controllable
T2I generation. At the core of FlexEControl is a unique weight decomposition strategy, which
allows for streamlined integration of various input types. This approach not only enhances
the faithfulness of the generated image to the control, but also significantly reduces the
computational overhead typically associated with multimodal conditioning. Our approach
achieves a reduction of 41% in trainable parameters and 30% in memory usage compared
with Uni-ControlNet. Moreover, it doubles data efficiency and can flexibly generate images
under the guidance of multiple input conditions of various modalities.

URL: https://openreview.net/forum?id=y8DSGN5nuN

---

Reply all

Reply to author

Forward

0 new messages