Daily TMLR digest for Jul 03, 2024

2 views
Skip to first unread message

TMLR

unread,
Jul 3, 2024, 12:00:08 AMJul 3
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: BaSIS-Net: From Point Estimate to Predictive Distribution in Neural Networks - A Bayesian Sequential Importance Sampling Framework

Authors: Giuseppina Carannante, Nidhal Bouaynaya, Lyudmila Mihaylova, Ghulam Rasool

Abstract: Data-driven Deep Learning (DL) models have revolutionized autonomous systems, but ensuring their safety and reliability necessitates the assessment of predictive confidence or uncertainty. Bayesian DL provides a principled approach to quantify uncertainty via probability density functions defined over model parameters. However, the exact solution is intractable for most DL models, and the approximation methods, often based on heuristics, suffer from scalability issues and stringent distribution assumptions and may lack theoretical guarantees. This work develops a Sequential Importance Sampling framework that approximates the posterior probability density function through weighted samples (or particles), which can be used to find the mean, variance, or higher-order moments of the posterior distribution. We demonstrate that propagating particles, which capture information about the higher-order moments, through the layers of the DL model results in increased robustness to natural and malicious noise (adversarial attacks). The variance computed from these particles effectively quantifies the model’s decision uncertainty, demonstrating well-calibrated and accurate predictive confidence.

URL: https://openreview.net/forum?id=V92PnXQ7UW

---

Title: Object-Centric Relational Representations for Image Generation

Authors: Luca Butera, Andrea Cini, Alberto Ferrante, Cesare Alippi

Abstract: Conditioning image generation on specific features of the desired output is a key ingredient of modern generative models. However, existing approaches lack a general and unified way of representing structural and semantic conditioning at diverse granularity levels. This paper explores a novel method to condition image generation, based on object-centric relational representations. In particular, we propose a methodology to condition the generation of objects in an image on the attributed graph representing their structure and the associated semantic information. We show that such architectural biases entail properties that facilitate the manipulation and conditioning of the generative process and allow for regularizing the training procedure. The proposed conditioning framework is implemented by means of a neural network that learns to generate a 2D, multi-channel, layout mask of the objects, which can be used as a soft inductive bias in the downstream generative task. To do so, we leverage both 2D and graph convolutional operators. We also propose a novel benchmark for image generation consisting of a synthetic dataset of images paired with their relational representation. Empirical results show that the proposed approach compares favorably against relevant baselines.

URL: https://openreview.net/forum?id=7kWjB9zW90

---

Title: A General-Purpose Multi-Modal OOD Detection Framework

Authors: Viet Quoc Duong, Qiong Wu, Zhengyi Zhou, Eric Zavesky, WenLing Hsu, Han Zhao, Huajie Shao

Abstract: Out-of-distribution (OOD) detection seeks to identify test samples that deviate from the training data, which is critical to ensuring the safety and reliability of machine learning (ML) systems. While a plethora of methods have been developed to detect uni-modal OOD samples, only a few have focused on multi-modal OOD detection. Current contrastive learning-based methods primarily address multi-modal OOD detection in a scenario where an image is not related to the class labels in training data. However, ML systems in the real-world applications may encounter a broader spectrum of anomalies caused by different factors like systematic errors in labeling, environmental changes, and sensor malfunctions. Hence, we propose a new method to be able to simultaneously detect anomalies from multiple different OOD scenarios, arising from fine-grained image features and textual descriptions, instead of large categorical information. To achieve this goal, we propose a general-purpose weakly-supervised OOD detection framework, called WOOD, that combines a binary classifier and a contrastive learning module to reap the benefits of both. In order to better distinguish in-distribution (ID) samples from OOD ones, we employ the Hinge loss to constrain the similarity of their latent representations. Moreover, we devise a new scoring metric that fuses predictions from both the binary classifier and contrastive learning to enhance OOD detection. Extensive experimental results on multiple benchmarks demonstrate that the proposed WOOD significantly outperforms the state-of-the-art methods for multi-modal OOD detection. Importantly, our approach can achieve superior detection performance in a variety of OOD scenarios.

URL: https://openreview.net/forum?id=nYzws7sSzo

---


New submissions
===============


Title: Reward Distance Comparisons Under Transition Sparsity

Abstract: Reward comparisons are vital for evaluating differences in agent behaviors induced by a set of reward functions. Most conventional techniques employ optimized policies to derive these behaviors; however, learning these policies can be computationally expensive and susceptible to safety concerns. Direct reward comparison techniques obviate policy learning but suffer from transition sparsity, where only a small subset of transitions are sampled due to data collection challenges and feasibility constraints. Existing state-of-the-art direct reward comparison methods are ill-suited for these sparse conditions since they require high transition coverage, where the majority of transitions from a given coverage distribution are sampled. When this requirement is not satisfied, a distribution mismatch between sampled and expected transitions can occur, introducing significant errors. This paper introduces the Sparsity Agnostic Reward Distance (SARD) pseudometric, designed to eliminate the need for high transition coverage by accommodating diverse sample distributions, likely common under transition sparsity. We provide theoretical justifications for SARD's robustness and conduct empirical studies to demonstrate its practical efficacy across various domains, namely Gridworld, Bouncing Balls, Drone Combat, and StarCraft 2.

URL: https://openreview.net/forum?id=kCONxY2AVT

---

Title: Recurrent Inertial Graph-Based Estimator (RING): A Single Pluripotent Inertial Motion Tracking Solution

Abstract: This paper introduces a novel ML-based method for Inertial Motion Tracking (IMT) that fundamentally changes the way this technology is used. The proposed method, named RING (Recurrent Inertial Graph-Based Estimator), provides a pluripotent, problem-unspecific plug-and-play IMT solution that, in contrast to conventional IMT solutions, eliminates the need for expert knowledge to identify, select, and parameterize the appropriate method. RING's pluripotency is enabled by a novel online-capable neural network architecture that uses a decentralized network of message-passing, parameter-sharing recurrent neural networks, which map local IMU measurements and nearest-neighbour messages to local orientations. This architecture enables RING to address a broad range of IMT problems that vary greatly in aspects such as the number of attached sensors, or the number of segments in the kinematic chain, and even generalize to previously unsolved IMT problems, including the challenging combination of magnetometer-free and sparse sensing with unknown sensor-to-segment parameters. Remarkably, RING is trained solely on simulated data, yet evaluated on experimental data, which indicates its exceptional ability to zero-shot generalize from simulation to experiment, while outperforming several state-of-the-art problem-specific solutions. For example, RING can, for the first time, accurately track a four-segment kinematic chain (which requires estimating four orientations) using only two magnetometer-free inertial measurement units. This research not only makes IMT more powerful and less restrictive in established domains ranging from biomechanics to autonomous systems, but also opens its application to new users and fields previously untapped by motion tracking technology. Code and data is available at https://github.com/anonymous-sup-material/ring_supplementary_material.

URL: https://openreview.net/forum?id=h2C3rkn0zR

---

Title: Graph-level Representation Learning with Joint-Embedding Predictive Architectures

Abstract: Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal y from the latent representation of a context signal x. JEPAs bypass the need for negative and positive samples, traditionally required by contrastive learning while avoiding the overfitting issues associated with generative pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm by proposing a Graph Joint-Embedding Predictive Architecture (Graph-JEPA). In particular, we employ masked modeling and focus on predicting the latent representations of masked subgraphs starting from the latent representation of a context subgraph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative prediction objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Through multiple experimental evaluations, we show that Graph-JEPA can learn highly semantic and expressive representations, as shown by the downstream performance in graph classification, regression, and distinguishing non-isomorphic graphs. The code will be made available upon acceptance.

URL: https://openreview.net/forum?id=v47f4DwYZb

---

Title: DS2TA: Denoising Spiking Transformer with Attenuated Spatiotemporal Attention

Abstract: Vision Transformers (ViT) are current high-performance models of choice for various vision applications. Recent developments have given rise to biologically inspired spiking transformers that thrive in ultra-low power operations on neuromorphic hardware, however, without fully unlocking the potential of spiking neural networks. We introduce DS2TA, a Denoising Spiking transformer with attenuated SpatioTemporal Attention, designed specifically for vision applications. DS2TA introduces a new spiking attenuated spatiotemporal attention mechanism that considers input firing correlations occurring in both time and space, thereby fully harnessing the computational power of spiking neurons at the core of the transformer architecture. Importantly, DS2TA facilitates parameter-efficient spatiotemporal attention computation without introducing extra weights. DS2TA employs efficient hashmap-based nonlinear spiking attention denoisers to enhance the robustness and expressive power of spiking attention maps. DS2TA demonstrates state-of-the-art performances on several widely adopted static image and dynamic neuromorphic datasets. Operated over 4 time steps, DS2TA achieves 94.92% top-1 accuracy on CIFAR10 and 77.47% top-1 accuracy on CIFAR100, as well as 79.1% and 94.44% on CIFAR10-DVS and DVS-Gesture using 10 time steps.

URL: https://openreview.net/forum?id=7GPDccWOZK

---

Title: Simple Steps to Success: A Method for Step-Based Counterfactual Explanations

Abstract: Algorithmic recourse is a process that leverages counterfactual explanations, going beyond understanding why a system produced a given classification, to providing a user with actions they can take to change their predicted outcome. Existing approaches to compute such interventions---known as {\em recourse}---identify a set of points that satisfy some desiderata---e.g. an intervention in the underlying causal graph, minimizing a cost function, etc. Satisfying these criteria, however, requires extensive knowledge of the underlying model structure, an often unrealistic amount of information in several domains. We propose a data-driven and model-agnostic framework to compute counterfactual explanations. We introduce StEP, a computationally efficient method that offers \emph{incremental steps} along the data manifold that directs users towards their desired outcome. We show that StEP uniquely satisfies a desirable set of axioms. Furthermore, via a thorough empirical and theoretical investigation, we show that StEP offers provable robustness and privacy guarantees while outperforming popular methods along important metrics.

URL: https://openreview.net/forum?id=R6ey5DKaoX

---

Reply all
Reply to author
Forward
0 new messages