Daily TMLR digest for Jun 05, 2025

5 views

Skip to first unread message

TMLR

unread,

Jun 5, 2025, 12:06:17 AM6/5/25

to tmlr-anno...@googlegroups.com

New certifications
==================

Reproducibility Certification: Reproducibility Study of ’SLICE: Stabilized LIME for Consistent Explanations for Image Classification’

Aritra Bandyopadhyay, Chiranjeev Bindra, Roan van Blanken, Arijit Ghosh

https://openreview.net/forum?id=vKUPXuEzj8

---

Accepted papers
===============

Title: Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations

Authors: Mohammed Baharoon, Jonathan Klein, Dominik Michels

Abstract: Vision-language contrastive learning frameworks such as CLIP enable learning representations from natural language supervision and provide strong zero-shot classification capabilities. However, due to the nature of the supervisory signal in these paradigms, they lack the ability to learn localized features, leading to degraded performance on dense prediction tasks such as segmentation and detection. On the other hand, self-supervised learning methods have shown the ability to learn granular representations, complementing the high-level features in vision-language training. In this work, we present Harmony, a framework that combines vision-language training with discriminative and generative self-supervision to learn visual features that can be generalized across different downstream vision tasks. Our framework is specifically designed to work on web-scraped data by not relying on negative examples in the self-supervised learning path and addressing the one-to-one correspondence issue using soft CLIP targets generated by an EMA model. Moreover, Harmony optimizes for five different objectives simultaneously, efficiently utilizing the supervision in each data example, making it even more suited in data-constrained settings. We comprehensively evaluate Harmony across various vision downstream tasks and find that it significantly outperforms the baseline CLIP and outperforms the previously leading joint self- and weakly supervised methods, SLIP, MaskCLIP, and DetailCLIP. Specifically, when compared against these methods, Harmony shows superior performance in linear-probing, fine-tuning, and zero-shot classification on ImageNet-1k, semantic segmentation on ADE20K, and both object detection and instance segmentation on MS-COCO, when pre-training a ViT-B on CC3M. We also show that Harmony outperforms SILC on detection, linear and fine-tuning classification, and outperforms other self-supervised learning methods like iBOT and MAE across all tasks evaluated. Our code is publicly available at https://github.com/MohammedSB/Harmony.

URL: https://openreview.net/forum?id=IcOBCufqFO

---

Title: Full-Rank Unsupervised Node Embeddings for Directed Graphs via Message Aggregation

Authors: Ciwan Ceylan, Kambiz Ghoorchian, Danica Kragic

Abstract: Linear message-passing models have emerged as compelling alternatives to non-linear graph neural networks for unsupervised node embedding learning, due to their scalability and competitive performance on downstream tasks. However, we identify a fundamental flaw in recently proposed linear models that combine embedding aggregation with concatenation during each message-passing iteration: rank deficiency. A rank-deficient embedding matrix contains column vectors which take arbitrary values, leading to ill-conditioning that degrades downstream task accuracy, particularly in unsupervised tasks such as graph alignment. We deduce that repeated embedding aggregation and concatenation introduces linearly dependent features, causing rank deficiency. To address this, we propose ACC (Aggregate, Compress, Concatenate), a novel model that avoids redundant feature computation by applying aggregation to the messages from the previous iteration, rather than the embeddings. Consequently, ACC generates full-rank embeddings, significantly improving graph alignment accuracy from 10% to 60% compared to rank-deficient embeddings, while also being faster to compute. Additionally, ACC employs directed message-passing and achieves node classification accuracies comparable to state-of-the-art self-supervised graph neural networks on directed graph benchmarks, while also being over 70 times faster on graphs with over 1 million edges.

URL: https://openreview.net/forum?id=3ECbEZg2If

---

Title: Prior Learning in Introspective VAEs

Authors: Ioannis Athanasiadis, Fredrik Lindsten, Michael Felsberg

Abstract: Variational Autoencoders (VAEs) are a popular framework for unsupervised learning and data generation. A plethora of methods have been proposed focusing on improving VAEs, with the incorporation of adversarial objectives and the integration of prior learning mechanisms being prominent directions. When it comes to the former, an indicative instance is the recently introduced family of Introspective VAEs aiming at ensuring that a low likelihood is assigned to unrealistic samples. In this study, we focus on the Soft-IntroVAE (S-IntroVAE), one of only two members of the Introspective VAE family, the other being the original IntroVAE. We select S-IntroVAE for its state-of-the-art status and its training stability. In particular, we investigate the implication of incorporating a multimodal and trainable prior into this S-IntroVAE. Namely, we formulate the prior as a third player and show that when trained in cooperation with the decoder constitutes an effective way for prior learning, which shares the Nash Equilibrium with the vanilla S-IntroVAE. Furthermore, based on a modified formulation of the optimal ELBO in S-IntroVAE, we develop theoretically motivated regularizations, namely (i) adaptive variance clipping to stabilize training when learning the prior and (ii) responsibility regularization to discourage the formation of inactive prior modes. Finally, we perform a series of targeted experiments on a 2D density estimation benchmark and in an image generation setting comprised of the (F)-MNIST and CIFAR-10 datasets demonstrating the effect of prior learning in S-IntroVAE in generation and representation learning.

URL: https://openreview.net/forum?id=u4YDVFodYX

---

Title: Learning Using a Single Forward Pass

Authors: Aditya Somasundaram, Pushkal Mishra, Ayon Borthakur

Abstract: We propose a learning algorithm to overcome the limitations of traditional backpropagation in resource-constrained environments: Solo Pass Embedded Learning Algorithm (SPELA). SPELA operates with local loss functions to update weights, significantly saving on resources
allocated to the propagation of gradients and storing computational graphs while being sufficiently accurate. Consequently, SPELA can closely match backpropagation using less memory. Moreover, SPELA can effectively fine-tune pre-trained image recognition models
for new tasks. Further, SPELA is extended with significant modifications to train CNN networks, which we evaluate on CIFAR-10, CIFAR-100, and SVHN 10 datasets, showing equivalent performance compared to backpropagation. Our results indicate that SPELA, with its features such as local learning and early exit, is a potential candidate for learning in resource-constrained edge AI applications.

URL: https://openreview.net/forum?id=EDQ8QOGqjr

---

Title: Reproducibility Study of ’SLICE: Stabilized LIME for Consistent Explanations for Image Classification’

Authors: Aritra Bandyopadhyay, Chiranjeev Bindra, Roan van Blanken, Arijit Ghosh

Abstract: This paper presents a reproducibility study of SLICE: Stabilized LIME for Consistent Explanations for Image Classification by Bora et al. (2024). SLICE enhances LIME by incorporating Sign Entropy-based Feature Elimination (SEFE) to remove unstable superpixels and an adaptive perturbation strategy using Gaussian blur to improve consistency in feature importance rankings. The original work claims that SLICE significantly improves explanation stability and fidelity. Our study systematically verifies these claims through extensive experimentation using the Oxford-IIIT Pets, PASCAL VOC, and MS COCO datasets. Our results confirm that SLICE achieves higher consistency than LIME, supporting its ability to reduce instability. However, our fidelity analysis challenges the claim of superior performance, as LIME often achieves higher Ground Truth Overlap (GTO) scores, indicating stronger alignment with object segmentations. To further investigate fidelity, we introduce an alternative AOPC evaluation to ensure a fair comparison across methods. Additionally, we propose GRID-LIME, a structured grid-based alternative to LIME, which improves stability while maintaining computational efficiency. Our findings highlight trade-offs in post-hoc explainability methods and emphasize the need for fairer fidelity evaluations. Our implementation is publicly available at our GitHub repository.

URL: https://openreview.net/forum?id=vKUPXuEzj8

---

Title: Multi-objective Bayesian optimization for Likelihood-Free inference in sequential sampling models of decision making

Authors: David Chen, Xinwei Li, Eui-Jin Kim, Prateek Bansal, David J Nott

Abstract: Statistical models are often defined by a generative process for simulating synthetic data, but this can lead to intractable likelihoods. Likelihood free inference (LFI) methods enable Bayesian inference to be performed in this case. Extending a popular approach to simulation-efficient LFI for single-source data, we propose Multi-objective Bayesian Optimization for Likelihood Free Inference (MOBOLFI) to perform LFI using multi-source data. MOBOLFI models a multi-dimensional discrepancy between observed and simulated data, using a separate discrepancy for each data source. The use of a multivariate discrepancy allows for approximations to individual data source likelihoods in addition to the joint likelihood, enabling detection of conflicting information and deeper understanding of the importance of different data sources in estimating individual parameters. The adaptive choice of simulation parameters using multi-objective Bayesian optimization ensures simulation efficient approximation of likelihood components for all data sources. We illustrate our approach in sequential sampling models (SSMs), which are widely used in psychology and consumer-behavior modeling. SSMs are often fitted using multi-source data, such as choice and response time. The advantages of our approach are illustrated in comparison with a single discrepancy for an SSM fitted to data assessing preferences of ride-hailing drivers in Singapore to rent electric vehicles.

URL: https://openreview.net/forum?id=hQjwDqfSzj

---

New submissions
===============

Title: Error Correction by Agreement Checking for Adversarial Robustness against Black-box Attacks

Abstract: Inspired by how the early stages of visual perception in humans and primates are vulnerable to adversarial attacks, we present a new defense method called Error Correction by Agreement Checking (ECAC). This strategy is designed to mitigate realistic black-box threats. We exploit the fact that natural and adversarially trained models rely on distinct feature sets for classification. Notably, naturally trained models retain commendable accuracy against adversarial examples generated using adversarially trained models. Leveraging this disparity, ECAC moves the input toward the prediction of the naturally trained model unless it leads to disagreement in prediction between the two models, before making the prediction. This simple error correction mechanism is highly effective against leading SQA (Score-based Query Attacks) as well as decision-based and transfer-based black-box attacks. We also verify that, unlike other black-box defenses, ECAC maintains significant robustness even when adversary has full access to the model. We demonstrate its effectiveness through comprehensive experiments across various datasets (CIFAR and ImageNet) and architectures (ResNet and ViT).

URL: https://openreview.net/forum?id=XgK05fssnx

---

Title: MATEY: multiscale adaptive transformer models for spatiotemporal physical systems

Abstract: Accurate representation of the multiscale features in spatiotemporal physical systems using vision transformer (ViT) architectures requires extremely long, computationally prohibitive token sequences. To address this issue, we propose two adaptive tokenization schemes that dynamically adjust patch sizes based on local features: one ensures convergent behavior to uniform patch refinement, while the other offers better computational efficiency.
Moreover, we present a set of spatiotemporal attention schemes, where the temporal or axial spatial dimensions are decoupled, and evaluate their computational and data efficiencies.
We assess the performance of the proposed multiscale adaptive model, MATEY, in a sequence of experiments.
The results show that adaptive tokenization schemes achieve improved accuracy without significantly increasing the length of the token sequence.
Compared to a full spatiotemporal attention scheme or a scheme that decouples only the temporal dimension, we find that fully decoupled axial attention is less efficient and expressive, requiring more training time and model weights to achieve the same accuracy.
Finally, we demonstrate in two fine-tuning tasks featuring different physics that models pretrained on PDEBench data outperform the ones trained from scratch, especially in the low data regime with frozen attention.

URL: https://openreview.net/forum?id=sIhbw5c1nG

---

Title: Improving Tabular Generative Models: Loss Functions, Benchmarks, and Improved Multi-objective Bayesian Optimization Approaches

Abstract: Access to extensive data is essential to improve model performance and generalization in deep learning (DL). When dealing with sparse datasets—those with limited samples relative to model complexity—a promising solution is to generate synthetic data using deep generative models (DGMs). However, these models often struggle to capture the complexities of real-world tabular data, including diverse variable types, imbalances, and intricate dependencies. Additionally, standard Bayesian optimization (SBO), commonly used for hyper-parameter tuning, struggles to optimize over aggregated metrics with different units, leading to unreliable averaging and suboptimal decisions. To address these gaps, we introduce a novel correlation- and distribution-aware loss function that regularizes DGMs, enhancing their ability to generate synthetic tabular data that faithfully represents the underlying data distributions. Theoretical guarantees for the proposed loss functions are provided, including stability and consistency analyses, ensuring their robustness. To enable principled hyperparameter search via Bayesian optimization (BO), we also propose a new multi-objective aggregation strategy based on iterative objective refinement Bayesian optimization (IORBO), along with a comprehensive statistical testing framework. We validate the proposed approach using a benchmarking framework with twenty real-world datasets and ten established tabular DGM baselines. The results demonstrate that the proposed loss function significantly improves the fidelity of the synthetic data generated with DGMs, leading to better performance in downstream machine learning (ML) tasks. Furthermore, the IORBO consistently outperformed SBO, yielding superior hyper-parameter results. This work advances synthetic data generation and optimization techniques, enabling more robust DL applications.

URL: https://openreview.net/forum?id=RPZ0EW0lz0

---

Title: Recursive SNE: Fast Prototype-Based t-SNE for Large-Scale and Online Data

Abstract: Dimensionality reduction techniques like t-SNE excel at visualizing structure in high-dimensional data but incur high computational costs that limit their use on large or streaming datasets. We introduce the Recursive SNE (RSNE) framework, which extends t-SNE with two complementary strategies: i-RSNE for real-time, point-wise updates and Bi-RSNE for efficient batch processing. Across diverse settings, including standard image benchmarks (CIFAR10/CIFAR100) with DINOv2 and CLIP features, domain-specific iROADS road scenes, and long-term climate records, RSNE delivers substantial speedups over Barnes–Hut t-SNE while maintaining or even improving cluster separability. By combining a lightweight prototype-based initialization with localized KL-divergence refinements, RSNE offers a scalable and adaptable framework for both large-scale offline embedding and on-the-fly visualization of streaming data.

URL: https://openreview.net/forum?id=7wCPAFMDWM

---

Title: Unifi3D: A Study on 3D Representations for Generation and Reconstruction in a Common Framework

Abstract: Following rapid advancements in text and image generation, research has increasingly shifted towards 3D generation. Unlike the well-established pixel-based representation in images, 3D representations remain diverse and fragmented, encompassing a wide variety of approaches such as voxel grids, neural radiance fields, signed distance functions, point clouds, or octrees, each offering distinct advantages and limitations.
In this work, we present a unified evaluation framework designed to assess the performance of 3D representations in reconstruction and generation. We compare these representations based on multiple criteria: quality, computational efficiency, and generalization performance. Beyond standard model benchmarking, our experiments aim to derive best practices over all steps involved in the 3D generation pipeline, including preprocessing, mesh reconstruction, compression with autoencoders, and generation. Our findings highlight that reconstruction errors significantly impact overall performance, underscoring the need to evaluate generation and reconstruction jointly.
We provide insights that can inform the selection of suitable 3D models for various applications, facilitating the development of more robust and application-specific solutions in 3D generation.
The code for our framework is available at https://anonymous.4open.science/r/unifi3d-39CD.

URL: https://openreview.net/forum?id=GQpTWpXILA

---

Title: Measurement Manipulation of the Matrix Sensing Problem to Improve Optimization Landscape

Abstract: This work studies the matrix sensing (MS) problem through the lens of the Restricted Isometry Property (RIP). It has been shown in several recent papers that two different techniques of convex relaxations and local search methods for the MS problem both require the RIP constant to be less than 0.5 while most real-world problems have their RIPs close to 1. The existing literature guarantees a small RIP constant only for sensing operators having an i.i.d. Gaussian distribution, and it is well-known that the MS problem could have a complicated landscape when the RIP is greater than 0.5. In this work, we address this issue and improve the optimization landscape by developing two results. First, we show that any sensing operator with a model not too distant from i.i.d. Gaussian has a slightly higher RIP than i.i.d. Gaussian. Second, we show that if the sensing operator has an arbitrary distribution, it can be modified in such a way that the resulting operator will act as a perturbed Gaussian with a lower RIP constant. Our approach is a preconditioning/mixing technique that replaces each sensing matrix with a weighted sum of all sensing matrices. This approach does not require taking new measurements (which is not possible in many applications) and relies only on mixing existing measurements. We numerically demonstrate that the RIP constants for different distributions can be reduced from almost 1 to less than 0.5 via the preconditioning of the sensing operator.

URL: https://openreview.net/forum?id=OR8JWLRUrM

---

Title: Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks

Abstract: Natural-gradient methods markedly accelerate the training of Physics-Informed Neural Networks (PINNs), yet their Gauss–Newton update must normally be solved in the parameter space, incurring a prohibitive $\mathcal{O}(n^{3})$ time complexity, where $n$ is the number of network weights. We show that exactly the same step can instead be formulated in a generally smaller residual space of size $m=\sum_{\gamma}N_{\gamma}d_{\gamma}$, where each residual class $\gamma$ (e.g. PDE interior, boundary, initial data) contributes $N_{\gamma}$ collocation points of output dimension $d_{\gamma}$.

Building on this insight, we introduce Dual Natural Gradient Descent (D-NGD). D-NGD computes the Gauss–Newton step in residual space, augments it with a geodesic-acceleration correction at negligible extra cost, and provides both a dense direct solver for modest $m$ and a Nyström-preconditioned conjugate-gradient solver for larger $m$.

Experimentally, D-NGD scales second-order PINN optimisation to networks with up to 12.8 million parameters, delivers one- to three-order-of-magnitude lower final $L^{2}$ error than first-order (Adam, SGD) and quasi-Newton methods, and—crucially—enables full natural-gradient training of PINNs at this scale on a single GPU.

URL: https://openreview.net/forum?id=GDHVRy6SDd

---

Title: On the Universal Statistical Consistency of Expansive Hyperbolic Deep Convolutional Neural Networks

Abstract: The emergence of Deep Convolutional Neural Networks (DCNNs) has been a pervasive tool for accomplishing widespread applications in computer vision. Despite its potential capability to capture intricate patterns inside the data, the underlying embedding space remains Euclidean and primarily pursues contractive convolution. Several instances can serve as a precedent for the exacerbating performance of DCNNs. The recent advancement of neural networks in the hyperbolic spaces gained traction, incentivizing the development of convolutional deep neural networks in the hyperbolic space. In this work, we propose Hyperbolic DCNN based on the Poincar\'{e} Ball. The work predominantly revolves around analyzing the nature of expansive convolution in the context of the non-Euclidean domain. We further offer extensive theoretical insights about the universal consistency of the expansive convolution in the hyperbolic space. Several simulations were performed not only on the synthetic datasets but also on some real-world datasets. The experimental results reveal that the hyperbolic convolutional architecture outperforms the Euclidean ones by a commendable margin.

URL: https://openreview.net/forum?id=YWJeeK8y2k

---

Title: Two Is Better Than One: Aligned Representation Pairs for Anomaly Detection

Abstract: Anomaly detection focuses on identifying samples that deviate from the norm. Discovering informative representations of normal samples is crucial to detecting anomalies effectively. Recent self-supervised methods have successfully learned such representations by employing prior knowledge about anomalies to create synthetic outliers during training. However, we often do not know what to expect from unseen data in specialized real-world applications. In this work, we address this limitation with our new approach Con$_2$, which leverages symmetries in normal samples to observe the data in different contexts. Con$_2$ clusters representations according to their context and simultaneously aligns their positions to learn an informative representation space that is structured according to the properties of normal data. Anomalies do not adhere to the same structure as normal data, making their representations deviate from the learned context clusters. We demonstrate the benefit of this approach in extensive experiments on specialized medical datasets, outperforming competitive baselines based on self-supervised learning and pretrained models.

URL: https://openreview.net/forum?id=Bt0zdsnWYc

---

Title: Joint Diffusion for Universal Hand-Object Grasp Generation

Abstract: In this work, we focus on generating both the hand and objects in a grasp by a single diffusion model. Our proposed Joint Hand-Object Diffusion (JHOD) models the hand and object in a unified latent representation. It leverages large-scale object datasets to learn an inclusive object latent embedding. Also, it uses the hand-object grasping data to learn to accommodate hand and object embedding to form grasps. With or without a given object as an optional condition, the diffusion model can generate grasps unconditionally or conditional to the object. Compared to the usual practice of learning object-conditioned grasp generation from only hand-object grasp data, our method benefits from more diverse object data used for training to handle grasp generation more universally. According to both qualitative and quantitative experiments, both conditional and unconditional generation of hand grasp achieves good visual plausibility and diversity. The proposed method generalizes well to unseen object shapes. The code and weights will be made public upon acceptance.

URL: https://openreview.net/forum?id=TZ0ztsYR6x

---

Reply all

Reply to author

Forward

0 new messages