Daily TMLR digest for Apr 23, 2025

3 views
Skip to first unread message

TMLR

unread,
Apr 23, 2025, 12:06:06 AM4/23/25
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: MaxCutBench: Revisiting and Benchmarking Graph Neural Networks for Maximum Cut

Authors: Ankur Nath, Alan Kuhnle

Abstract: Recently, there has been much work on designing general heuristics for graph-based, combinatorial optimization problems via the incorporation of Graph Neural Networks (GNNs) to learn distribution-specific solution structures. However, there is a lack of consistency in evaluating these heuristics in terms of the baselines and instances chosen, making it difficult to assess the relative performance of the algorithms. In this paper, we introduce \textbf{MaxCutBench}—an open-source benchmark suite dedicated to the NP-hard Maximum Cut problem. The suite offers a unified interface for $16$ algorithms, both traditional and machine-learning-based. Using our benchmark, we conduct an in-depth analysis of the implemented algorithms on a carefully selected set of hard instances from diverse graph datasets. Our main finding is that classical local search heuristics can outperform several highly cited learning-based approaches, including S2V-DQN (Khalil et al., 2017), ECO-DQN (Barrett et al., 2020), among others, in terms of objective value, generalization, inference time, and scalability. Additionally, we find that the performance of ECO-DQN either remains the same or improves when the GNN is replaced by simple linear regression. We hope our benchmark will contribute to the efforts of the community to standardize the evaluation of learned heuristics for combinatorial optimization. Code, data, and pre-trained models are available at: \url{https://github.com/ankurnath/MaxCut-Bench}.

URL: https://openreview.net/forum?id=322PpCGAX8

---

Title: Future-aware Safe Active Learning of Time Varying Systems using Gaussian Processes

Authors: Markus Lange-Hegermann, Christoph Zimmer

Abstract: Experimental exploration of high-cost systems with safety constraints, common in engineering applications, is a challenging endeavor. Data-driven models offer a promising solution, but acquiring the requisite data remains expensive and is potentially unsafe. Safe active learning techniques prove essential, enabling the learning of high-quality models with minimal expensive data points and high safety. This paper introduces a safe active learning framework tailored for time-varying systems, addressing drift, seasonal changes, and complexities due to dynamic behavior. The proposed Time-aware Integrated Mean Squared Prediction Error (T-IMSPE) method minimizes posterior variance over current and future states, optimizing information gathering also in the time domain. Empirical results highlight T-IMSPE's advantages in model quality through synthetic and real-world examples. State of the art Gaussian processes are compatible with T-IMSPE. Our theoretical contributions include a clear delineation which Gaussian process kernels, domains, and weighting measures are suitable for T-IMSPE and even beyond for its non-time aware predecessor IMSPE.

URL: https://openreview.net/forum?id=YBPbMKJbLd

---

Title: VLM’s Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models

Authors: Nam Hyeon-Woo, Moon Ye-Bin, Wonseok Choi, Lee Hyun, Tae-Hyun Oh

Abstract: Vision language models (VLMs) have shown promising reasoning capabilities across various benchmarks; however, our understanding of their visual perception remains limited. In this work, we propose an eye examination process to investigate how a VLM perceives images, focusing on key aspects of visual recognition, ranging from basic color and shape to semantic understanding. We introduce a dataset, LENS, to guide VLMs to follow the examination and check its readiness. Once the model is ready, we conduct the examination. We quantify and visualize VLMs' sensitivities to color and shape, and semantic matching. Our findings reveal that VLMs have varying sensitivity to different colors while consistently showing insensitivity to green across different VLMs. Also, we found different shape sensitivity and semantic recognition depending on LLM's capacity despite using the same fixed visual encoder. Our analyses and findings have the potential to inspire the design of VLMs and the pre-processing of visual input to VLMs for improving application performance.

URL: https://openreview.net/forum?id=CgWkVb2lHB

---

Title: Jet: A Modern Transformer-Based Normalizing Flow

Authors: Alexander Kolesnikov, André Susano Pinto, Michael Tschannen

Abstract: In the past, normalizing generative flows have emerged as a promising class of generative models for natural images. This type of model has many modeling advantages: the ability to efficiently compute log-likelihood of the input data, fast generation, and simple overall structure. Normalizing flows remained a topic of active research but later fell out of favor, as visual quality of the samples was not competitive with other model classes, such as GANs, VQ-VAE-based approaches or diffusion models. In this paper we revisit the design of coupling-based normalizing flow models by carefully ablating prior design choices and using computational blocks based on the Vision Transformer architecture, not convolutional neural networks. As a result, we achieve a much simpler architecture that matches existing normalizing flow models and improves over them when paired with pretraining. While the overall visual quality is still behind the current state-of-the-art models, we argue that strong normalizing flow models can help advancing the research frontier by serving as building components of more powerful generative models.

URL: https://openreview.net/forum?id=jdvnaki7ZY

---


New submissions
===============


Title: YoooP: You Only Optimize One Prototype per Class for Non-Exemplar Incremental Learning

Abstract: Incremental learning (IL) usually addresses catastrophic forgetting of old tasks when learning new tasks by replaying old tasks' raw data stored in a memory, which can be limited by its size and the risk of privacy leakage. Recent non-exemplar IL methods store class centroids as prototypes and perturb them with high-dimensional Gaussian noise to generate synthetic data for replaying. Unfortunately, this approach has two major limitations. First, the boundary between embedding clusters around prototypes of different classes might be unclear, leading to serious catastrophic forgetting. Second, directly applying high-dimensional Gaussian noise produces nearly identical synthetic samples that fail to preserve the true data distribution, ultimately degrading performance. In this paper, we propose YoooP, a novel exemplar-free IL approach that can greatly outperform previous methods by only storing and replaying one prototype per class even without synthetic data replay. Instead of merely storing class centroids, YoooP optimizes each prototype by (1) shifting it to high-density regions within each class using an attentional mean-shift algorithm, and (2) optimizing its cosine similarity with class-specific embeddings to form compact, well-separated clusters. As a result, replaying only the optimized prototypes effectively reduces inter-class interference and maintains clear decision boundaries. Furthermore, we extend YoooP to YoooP+ by synthesizing replay data preserving the angular distribution between each class prototype and the class's real data in history, which cannot be obtained by high-dimensional Gaussian perturbation. YoooP+ effectively stabilizes and further improves YoooP without storing real data. Extensive experiments demonstrate the superiority of YoooP/YoooP+ over non-exemplar baselines in terms of different metrics. The source code will be released upon acceptance of the paper.

URL: https://openreview.net/forum?id=FYe66NLDkO

---

Title: Sortability of Time Series Data

Abstract: Evaluating the performance of causal discovery algorithms that aim to find causal relationships between time-dependent processes remains a challenging topic. In this paper, we show that certain characteristics of datasets, such as varsortability (Reisach et al. 2021) and R2-sortability (Reisach et al. 2023), also occur in datasets for autocorrelated stationary time series. We illustrate this empirically using four types of data: simulated data based on SVAR models and Erdős-Rényi graphs, the data used in the 2019 causality-for-climate challenge (Runge et al. 2019), real-world river stream datasets, and real-world data generated by the Causal Chamber of (Gamella et al. 2024). To do this, we adapt var- and R2-sortability to time series data. We also investigate the extent to which the performance of continuous score-based causal discovery methods goes hand in hand with high sortability. Arguably, our most surprising finding is that the investigated real-world datasets exhibit high varsortability and low R2-sortability indicating that scales may carry a significant amount of causal information.

URL: https://openreview.net/forum?id=OGvmCpcHdV

---

Title: Algorithmic fairness with monotone likelihood ratios

Abstract: We show that inequalities of many commonly used fairness metrics (true/false positive/negative rates, predicted positive/negative rates, and positive/negative predictive values) are guaranteed for groups with different outcome rates under a monotonically calibrated model whose risk distributions have a monotone likelihood ratio, extending existing impossibility results. We further provide lower bounds on the FNR/FPR disparities and PPR/PNR disparities in the same setting, showing that either the FNR disparity or FPR disparity is at least as large as the positive outcome rate disparity (for FNR disparity) or negative outcome rate disparity (for FPR disparity), and either the PPR disparity or PNR disparity is at least as large as the positive outcome rate disparity (for PPR disparity) or negative outcome rate disparity (for PNR disparity). While incompatibilities of some combinations of these metrics have been demonstrated previously, we are unaware of any work that has demonstrated direct incompatibility of calibration with these individual equalities, equivalence of these inequalities, or lower bounds for the disparity in these values under distributional assumptions about a model's predictions.

URL: https://openreview.net/forum?id=mtoWa0gIKy

---

Title: Trust Me, I’m Calibrated: Robustifying Deep Networks

Abstract: The tremendous success of deep neural networks (DNNs) in solving a wide range of complex computer vision tasks has paved the way for their deployment in real-world applications. However, challenges arise when these models are exposed to natural adversarial corruptions that can occur in unconstrained physical environments. Such corruptions are inherently present in the real world and can significantly degrade model performance by causing incorrect predictions. This vulnerability is further enhanced by the miscalibration of modern DNNs, where models tend to output incorrect predictions with high confidence. To ensure safe and reliable deployment, it is crucial to calibrate these models correctly. While existing literature primarily focuses on calibrating DNNs, it often overlooks the impact of adversarial corruption. Thus, substantial scope remains to explore how calibration techniques interact with adversarial robustness and whether improving calibration can increase robustness to corrupted or adversarial data. In this work, we aim to address this gap by employing uncertainty quantification methods to improve the calibration and robustness of DNNs and Transformer-based models against adversarial data.

URL: https://openreview.net/forum?id=qcNrlJd2S7

---

Title: Utilising Gradient-Based Proposals Within Sequential Monte Carlo Samplers for Training of Partial Bayesian Neural Networks

Abstract: Partial Bayesian neural networks (pBNNs) have been shown to perform competitively with fully Bayesian neural networks while only having a subset of the parameters be stochastic. Using sequential Monte Carlo (SMC) samplers as the inference method for pBNNs gives a non-parametric probabilistic estimation of the stochastic parameters, and has shown improved performance over parametric methods. In this paper we introduce a new SMC-based training method for pBNNs by utilising a guided proposal and incorporating gradient-based Markov kernels, which gives us better scalability on high dimensional problems. We show that our new method outperforms the state-of-the-art in terms of predictive performance and optimal loss. We also show that pBNNs scale well with larger batch sizes, resulting in significantly reduced training times and often better performance.

URL: https://openreview.net/forum?id=miT5oN8YwX

---

Title: Pre-Training Representations of Binary Code Using Contrastive Learning

Abstract: Binary code analysis and comprehension is critical to applications in reverse engineering and computer security tasks where source code is not available. Unfortunately, unlike source code, binary code lacks semantics and is more difficult for human engineers to understand and analyze. In this paper, we present ContraBin, a contrastive learning technique that integrates source code and comment information along with binaries to create an embedding capable of aiding binary analysis and comprehension tasks. Specifically, we present three components in ContraBin: (1) a primary contrastive learning method for initial pre-training, (2) a simplex interpolation method to integrate source code, comments, and binary code, and (3) an intermediate representation learning algorithm to train a binary code embedding. We further analyze the impact of human-written and synthetic comments on binary code comprehension tasks, revealing a significant performance disparity. While synthetic comments provide substantial benefits, human-written comments are found to introduce noise, even resulting in performance drops compared to using no comments. These findings reshape the narrative around the role of comment types in binary code analysis. We evaluate the effectiveness of ContraBin through four indicative downstream tasks related to binary code: algorithmic functionality classification, function name recovery, code summarization, and reverse engineering. The results show that ContraBin considerably improves performance on all four tasks, measured by accuracy, mean of average precision, and BLEU scores as appropriate. ContraBin is the first language representation model to incorporate source code, binary code, and comments into contrastive code representation learning and is intended to contribute to the field of binary code analysis. The dataset used in this study is available for further research.

URL: https://openreview.net/forum?id=qmfUL6D0iz

---

Title: Task Diversity Shortens the In-Context Learning Plateau

Abstract: In-context learning (ICL) describes a language model's ability to generate outputs based on a set of input demonstrations and a subsequent query. To understand this remarkable capability, researchers have studied simplified, stylized models. These studies have consistently observed long loss plateaus, during which models exhibit minimal improvement, followed by a sudden, rapid surge of learning. In this work, we reveal that training on multiple diverse ICL tasks simultaneously shortens the loss plateaus, making each task easier to learn. This finding is surprising as it contradicts the natural intuition that the combined complexity of multiple ICL tasks would lengthen the learning process, not shorten it. Our result suggests that the recent success in large-scale training of language models may be attributed not only to the richness of the data at scale but also to the easier optimization (training) induced by the diversity of natural language training data.

URL: https://openreview.net/forum?id=7t5DzaJOdB

---

Title: Bayesian Optimization of Robustness Measures under Input Uncertainty: A Randomized Gaussian Process Upper Confidence Bound Approach

Abstract: Bayesian optimization based on the Gaussian process upper confidence bound (GP-UCB) offers a theoretical guarantee for optimizing black-box functions. In practice, however, black-box functions often involve input uncertainty. To handle such cases, GP-UCB can be extended to optimize evaluation criteria known as robustness measures. However, GP-UCB-based methods for robustness measures require a trade-off parameter, $\beta$, which, as in the original GP-UCB, must be set sufficiently large to ensure theoretical validity. In this study, we propose randomized robustness measure GP-UCB (RRGP-UCB), a novel method that samples $\beta$ from a chi-squared-based probability distribution. This approach eliminates the need to explicitly specify $\beta$. Notably, the expected value of $\beta$ under this distribution is not excessively large. Furthermore, we show that RRGP-UCB provides tight bounds on the expected regret between the optimal and estimated solutions. Numerical experiments demonstrate the effectiveness of the proposed method.

URL: https://openreview.net/forum?id=FDzojiLSia

---

Title: Between Linear and Sinusoidal: Rethinking the Time Encoder in Dynamic Graph Learning

Abstract: Dynamic graph learning is essential for applications involving temporal networks and requires effective modeling of temporal relationships. Seminal attention-based models like TGAT and DyGFormer rely on sinusoidal time encoders to capture temporal relationships between edge events. In this paper, we study a simpler alternative: the linear time encoder, which avoids temporal information loss caused by sinusoidal functions and reduces the need for high dimensional time encoders. We show that the self-attention mechanism can effectively learn to compute time spans from linear time encodings and extract relevant temporal patterns. Through extensive experiments on six dynamic graph datasets, we demonstrate that the linear time encoder improves the performance of TGAT and DyGFormer in most cases. Moreover, the linear time encoder can lead to significant savings in model parameters with minimal performance loss. For example, compared to a 100-dimensional sinusoidal time encoder, TGAT with a 2-dimensional linear time encoder saves 43% of parameters and achieves higher average precision on five datasets. These results can be readily used to positively impact the design choices of a wide variety of dynamic graph learning architectures.

URL: https://openreview.net/forum?id=W6GQvdOGHg

---

Title: DIVINE: Diverse-Inconspicuous Feature Learning to Mitigate Abridge Learning

Abstract: Deep learning algorithms aim to minimize overall error and exhibit impressive performance on test datasets across various domains. However, they often struggle with out-of-distribution data samples. We posit that deep models primarily focus on capturing the prominent features beneficial for the task while neglecting other subtle yet discriminative features. This phenomenon is referred to as Abridge Learning. To address this issue and promote a more comprehensive learning process from data, we introduce a novel DIVerse and INconspicuous feature lEarning (DIVINE) approach aimed at counteracting Abridge Learning. DIVINE embodies a holistic learning methodology, effectively utilizing data by engaging with its diverse dominant features. Through experiments conducted on ten datasets,
including MNIST, CIFAR10, CIFAR100, TinyImageNet, and their corrupted and perturbed counterparts (CIFAR10-C, CIFAR10-P, CIFAR100-C, CIFAR100-P, TinyImageNet-C, and TinyImageNet-P), we demonstrate that DIVINE encourages the learning of a rich set of features. This, in turn, boosts the model’s robustness and its ability to generalize. The results on out-of-distribution datasets, such as those that are perturbed, achieve a performance 5.36%, 3.10%, and 21.85% mean Flip Rate (mFR) corresponding to CIFAR10-P, CIFAR100-P, and TinyImageNet-P datasets using DIVINE. On the other hand, Abridged Learning on CIFAR10-P, CIFAR100-P, and TinyImageNet-P datasets, achieve a performance 6.53%, 11.75%, and 31.90% mFR, respectively. The proposed DIVINE algorithm achieves state-of-the-art (sota) results on CIFAR100-P dataset when compared to existing algorithms.

URL: https://openreview.net/forum?id=8NGKGTAD6F

---

Title: A Unified Approach Towards Active Learning and Out-of-Distribution Detection

Abstract: In real-world applications of deep learning models, active learning (AL) strategies are essential for identifying label candidates from vast amounts of unlabeled data. In this context, robust out-of-distribution (OOD) detection mechanisms are crucial for handling data out-
side the target distribution during the application’s operation. Usually, these problems have been addressed separately. In this work, we introduce SISOM as a unified solution designed explicitly for AL and OOD detection. By combining feature space-based and uncertainty-
based metrics, SISOM leverages the strengths of the currently independent tasks to solve both effectively, without requiring specific training schemes. We conducted extensive experiments showing the problems arising when migrating between both tasks. In our experiments
SISOM underlined its effectiveness by achieving first place in two of the commonly used OpenOOD benchmarks settings and second place in the remaining one for near-OOD data. In AL, SISOM outperforms others and delivers top-1 performance in three benchmarks.

URL: https://openreview.net/forum?id=HL75La10FN

---

Reply all
Reply to author
Forward
0 new messages