Daily TMLR digest for Jun 14, 2024

0 views

Skip to first unread message

TMLR

unread,

Jun 14, 2024, 12:00:06 AMJun 14

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Estimating class separability of text embeddings with persistent homology.

Authors: Kostis Gourgoulias, Najah Ghalyan, Maxime Labonne, yash satsangi, Sean Moran, Joseph Sabelja

Abstract: This paper introduces an unsupervised method to estimate the class separability of text datasets from a topological point of view. Using persistent homology, we demonstrate how tracking the evolution of embedding manifolds during training can inform about class sep- arability. More specifically, we show how this technique can be applied to detect when the training process stops improving the separability of the embeddings. Our results, validated across binary and multi-class text classification tasks, show that the proposed method’s estimates of class separability align with those obtained from supervised methods. This approach offers a novel perspective on monitoring and improving the fine-tuning of sentence transformers for classification tasks, particularly in scenarios where labeled data is scarce. We also discuss how tracking these quantities can provide additional insights into the properties of the trained classifier.

URL: https://openreview.net/forum?id=8DWrIMuLya

---

Title: Exploring validation metrics for offline model-based optimisation with diffusion models

Authors: Christopher Beckham, Alexandre Piché, David Vazquez, Christopher Pal

Abstract: In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle, which is expensive to compute since it involves executing a real world process. In offline MBO we wish to do so without assuming access to such an oracle during training or validation, with makes evaluation non-straightforward. While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples. Measuring the mean reward of generated candidates over this approximation is one such `validation metric', whereas we are interested in a more fundamental question which is finding which validation metrics correlate the most with the ground truth. This involves proposing validation metrics and quantifying them over many datasets for which the ground truth is known, for instance simulated environments. This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation, which is the ultimate goal behind leveraging generative models for MBO. While our evaluation framework is model agnostic we specifically evaluate denoising diffusion models due to their state-of-the-art performance, as well as derive interesting insights such as ranking the most effective validation metrics as well as discussing important hyperparameters.

URL: https://openreview.net/forum?id=wC4ZID0H9a

---

New submissions
===============

Title: Video Diffusion Models: A Survey

Abstract: Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends. The survey concludes with an overview of remaining challenges and an outlook on the future of the field.

URL: https://openreview.net/forum?id=rJSHjhEYJx

---

Title: Learned feature representations are biased by complexity, learning order, position, and more

Abstract: Representation learning, and interpreting learned representations, are key areas of focus in machine learning and neuroscience. Both fields generally use representations as a means to understand or improve a system's computations. In this work, however, we explore surprising dissociations between representation and computation that may pose challenges for such efforts. We create datasets in which we attempt to match the computational role that different features play, while manipulating other properties of the features or the data. We train various deep learning architectures to compute these multiple abstract features about their inputs. We find that their learned feature representations are systematically biased towards representing some features more strongly than others, depending upon extraneous properties such as feature complexity, the order in which features are learned, and the distribution of features over the inputs. For example, features that are simpler to compute or learned first tend to be represented more strongly and densely than features that are more complex or learned later, even if all features are learned equally well. We also explore how these biases are affected by architectures, optimizers, and training regimes (e.g., in transformers, features decoded earlier in the output sequence also tend to be represented more strongly). Our results help to characterize the inductive biases of gradient-based representation learning. These results also highlight a key challenge for interpretability---or for comparing the representations of models and brains---disentangling extraneous biases from the computationally important aspects of a system's internal representations.

URL: https://openreview.net/forum?id=aY2nsgE97a

---

Title: EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval

Abstract: Dense embedding-based retrieval is widely used for semantic search and ranking. However, conventional two-stage approaches, involving contrastive embedding learning followed by approximate nearest neighbor search (ANNS), can suffer from misalignment between these stages. This mismatch degrades retrieval performance. We propose End-to-end Hierarchical Indexing (EHI), a novel method that directly addresses this issue by jointly optimizing embedding generation and ANNS structure. EHI leverages a dual encoder for embedding queries and documents while simultaneously learning an inverted file index (IVF)-style tree structure. To facilitate the effective learning of this discrete structure, EHI introduces dense path embeddings that encodes the path traversed by queries and documents within the tree. Extensive evaluations on standard benchmarks, including MS MARCO (Dev set) and TREC DL19, demonstrate EHI's superiority over traditional ANNS index. Under the same computational constraints, EHI outperforms existing state-of-the-art methods by +1.45% in MRR@10 on MS MARCO (Dev) and +8.2% in nDCG@10 on TREC DL19, highlighting the benefits of our end-to-end approach.

URL: https://openreview.net/forum?id=GeLLOGsHV9

---

Title: A Bag of Tricks for Few-Shot Class-Incremental Learning

Abstract: We present a bag of tricks framework for few-shot class-incremental learning (FSCIL), which is a challenging form of continual learning that involves continuous adaptation to new tasks with limited samples. FSCIL requires both stability and adaptability, i.e., preserving proficiency in previously learned tasks while learning new ones. Our proposed bag of tricks brings together eight key and highly influential techniques that improve stability, adaptability, and overall performance under a unified framework for FSCIL. We organize these tricks into three categories: stability tricks, adaptability tricks, and training tricks. Stability tricks aim to mitigate the forgetting of previously learned classes by enhancing the separation between the embeddings of learned classes and minimizing interference when learning new ones. On the other hand, adaptability tricks focus on the effective learning of new classes. Finally, training tricks improve the overall performance without compromising stability or adaptability. We perform extensive experiments on three benchmark datasets, CIFAR-100, CUB-200, and miniIMageNet, to evaluate the impact of our proposed framework. Our detailed analysis shows that our approach substantially improves both stability and adaptability, establishing a new state-of-the-art by outperforming prior works in the area. We believe our method provides a go-to solution and establishes a robust baseline for future research in this area.

URL: https://openreview.net/forum?id=DiyYf1Kcdt

---

Title: Target-conditioned GFlowNet for Structure-based Drug Design

Abstract: Searching the vast chemical space for drug-like molecules that bind with a protein pocket is a challenging task in drug discovery. Recently, structure-based generative models have been introduced which promise to be more efficient by learning to generate molecules for any given protein structure. However, since they learn the distribution of a limited protein-ligand complex dataset, structure-based methods do not yet outperform optimization-based methods that generate binding molecules for just one pocket. To overcome limitations on data while leveraging learning across protein targets, we choose to model the reward distribution conditioned on pocket structure, instead of the training data distribution. We introduce TacoGFN, a Generative Flow Network conditioned on any protein pocket structure to generate molecules with probabilities proportional to its reward. In the generative setting for CrossDocked2020 benchmark, TacoGFN attains a state-of-the-art success rate of 56.0% and -8.44 kcal/mol in median Vina Dock score while improving the generation time by multiple orders of magnitude. Fine-tuning TacoGFN further improves the median Vina Dock score to -10.93 kcal/mol and the success rate to 88.8%, outperforming all optimization-based methods.

URL: https://openreview.net/forum?id=N8cPv95zOU

---

Title: Intrinsic Biologically-Plausible Adversarial Robustness

Abstract: Artificial Neural Networks (ANNs) trained with Backpropagation (BP) excel in different daily tasks but have a dangerous vulnerability: inputs with small targeted perturbations, also known as adversarial samples, can drastically disrupt their performance. Adversarial training, a technique in which the training dataset is augmented with exemplary adversarial samples, is proven to mitigate this problem but comes at a high computational cost. In contrast to ANNs, humans are not susceptible to misclassifying these same adversarial samples. Thus, one can postulate that biologically-plausible trained ANNs might be more robust against adversarial attacks. In this work, we chose the biologically-plausible learning algorithm Present the Error to Perturb the Input To modulate Activity (PEPITA) as a case study and investigated this question through a comparative analysis with BP-trained ANNs on various computer vision tasks. We observe that PEPITA has a higher intrinsic adversarial robustness and, when adversarially trained, also has a more favorable natural-vs-adversarial performance trade-off. In particular, for the same natural accuracies on the MNIST task, PEPITA's adversarial accuracies decrease on average only by 0.26% while BP's decrease by 8.05%.

URL: https://openreview.net/forum?id=ESLVKKUBZq

---

Title: Set Features for Anomaly Detection

Abstract: This paper proposes to use set features for detecting anomalies in samples that consist of unusual combinations of normal elements. Many leading methods discover anomalies by detecting an unusual part of a sample. For example, state-of-the-art segmentation-based approaches, first classify each element of the sample (e.g., image patch) as normal or anomalous and then classify the entire sample as anomalous if it contains anomalous elements. However, such approaches do not extend well to scenarios where the anomalies are expressed by an unusual combination of normal elements. In this paper, we overcome this limitation by proposing set features that model each sample by the distribution of its elements. We compute the anomaly score of each sample using a simple density estimation method, using fixed features. Our approach outperforms the previous state-of-the-art in image-level logical anomaly detection and sequence-level time series anomaly detection.

URL: https://openreview.net/forum?id=aukOnn7J4M

---

Title: An Online Caching Mechanism for Deep Neural Networks

Abstract: Deep Neural Networks (DNNs) have become an essential component in many application domains, including web-based services. A variety of these services require high throughput and (close to) real-time features, for instance, to respond or react to users' requests or to process a stream of incoming data on time. However, the trend in DNN design is towards larger models with many layers and parameters to achieve more accurate results. Although these models are often pre-trained, the computational complexity in such large models can still be relatively significant, hindering low inference latency.
In this paper, we propose an end-to-end automated caching solution to improve the performance of DNN-based services in terms of computational complexity and inference latency. Our method adopts the ideas of self-distillation of DNN models and early exits. The proposed solution is an automated online layer caching mechanism that allows early exiting of a large model during inference time if the cache model in one of the early exits is confident enough for final prediction. One of the main contributions of this paper is that we have implemented the idea as an online caching, meaning that the cache models do not need access to training data and perform solely based on the incoming data at run-time, making it suitable for applications using pre-trained models.
The results of our experiments on two downstream tasks (image classification and object detection) show that, on average, caching can reduce the computational complexity of these services up to 58\% (in terms of FLOP count) and improve their inference latency up to 46\% with low to zero reduction in accuracy. Our approach also outperforms existing approaches, particularly when applied on complex models and larger datasets. It achieves a remarkable reduction in latency of 51.6\% and 30.4\%, exceeding the Gati and BranchyNet methods for CIFAR100-Resnet50. This enhancement is accompanied by an increase of 2. 92\% and 0. 87\% in the mean precision, further highlighting the superiority of our approach in demanding scenarios.

URL: https://openreview.net/forum?id=h4rUKKfl5S

---

Reply all

Reply to author

Forward

0 new messages