Daily TMLR digest for Aug 14, 2025

0 views
Skip to first unread message

TMLR

unread,
Aug 14, 2025, 12:06:06 AMAug 14
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: Beyond Grids: Multi-objective Bayesian Optimization With Adaptive Discretization

Authors: Andi Nika, Sepehr Elahi, Cagin Ararat, Cem Tekin

Abstract: We consider the problem of optimizing a vector-valued objective function $\boldsymbol{f}$ sampled from a Gaussian Process (GP) whose index set is a well-behaved, compact metric space $(\mathcal{X},d)$ of designs. We assume that $\boldsymbol{f}$ is not known beforehand and that evaluating $\boldsymbol{f}$ at design $x$ results in a noisy observation of $\boldsymbol{f}(x)$. Since identifying the Pareto optimal designs via exhaustive search is infeasible when the cardinality of $\mathcal{X}$ is large, we propose an algorithm, called Adaptive $\boldsymbol{\epsilon}$-PAL, that exploits the smoothness of the GP-sampled function and the structure of $(\mathcal{X},d)$ to learn fast. In essence, Adaptive $\boldsymbol{\epsilon}$-PAL employs a tree-based adaptive discretization technique to identify an $\boldsymbol{\epsilon}$-accurate Pareto set of designs in as few evaluations as possible. We provide both information-type and metric dimension-type bounds on the sample complexity of $\boldsymbol{\epsilon}$-accurate Pareto set identification. We also experimentally show that our algorithm outperforms other Pareto set identification methods.

URL: https://openreview.net/forum?id=Wq150HaRVE

---

Title: Controlling Statistical, Discretization, and Truncation Errors in Learning Fourier Linear Operators

Authors: Unique Subedi, Ambuj Tewari

Abstract: We study learning-theoretic foundations of operator learning, using the linear layer of the Fourier Neural Operator architecture as a model problem. First, we identify three main errors that occur during the learning process: statistical error due to finite sample size, truncation error from finite rank approximation of the operator, and discretization error from handling functional data on a finite grid of domain points. Finally, we analyze a Discrete Fourier Transform (DFT) based least squares estimator, establishing both upper and lower bounds on the aforementioned errors.

URL: https://openreview.net/forum?id=A2sHNGcjLO

---

Title: Table Foundation Models: on knowledge pre-training for tabular learning

Authors: Myung Jun Kim, Félix Lefebvre, Gaëtan Brison, Alexandre Perez-Lebel, Gaël Varoquaux

Abstract: Table foundation models bring high hopes to data science: pre-trained on tabular data to embark knowledge or priors, they should facilitate downstream tasks on tables. One specific challenge is that of data semantics: numerical entries take their meaning from context, *e.g.*, column name. The traditional approach combines column-specific data preparation with tree-based models that adapt to column specificities. Pre-trained neural networks that jointly model column names and table entries have recently boosted prediction accuracy. While these models outline the promises of world knowledge to interpret table values, they lack the convenience of popular foundation models in text or vision. Indeed, they must be fine-tuned to bring benefits, come with sizeable computation costs, and cannot easily be reused or combined with other architectures. Here we introduce TARTE, a foundation model that transforms tables to knowledge-enhanced vector representations using the string to capture semantics. Pre-trained on large relational data, TARTE yields representations that facilitate subsequent learning with little additional cost. These representations can be fine-tuned or combined with other learners, giving models that push the state-of-the-art prediction performance and improve the prediction/computation performance trade-off. Specialized to a task or a domain, TARTE gives domain-specific representations that facilitate further learning. Our study demonstrates an effective approach to knowledge pre-training for tabular learning.

URL: https://openreview.net/forum?id=QV4P8Csw17

---

Title: ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models

Authors: Yaswanth Narsupalli, Abhranil Chandra, Sreevatsa Muppirala, Manish Gupta, Pawan Goyal

Abstract: Assessing the quality of generative model outputs from large language models (LLMs) or vision-language models (VLMs), poses significant challenges. Traditional evaluation methods either rely on human assessment which is resource-intensive and not scalable or on automatic metrics that often correlate poorly with human preferences. Another approach is to train dedicated neural evaluators, but this typically requires substantial training data and compute. In this study, we thus introduce ReFeR, a tuning-free framework for evaluating generative outputs including both text and images, using a two-level hierarchy of pre-trained LLM and VLM evaluators. This multi-agent hierarchical strategy leverages additional compute at inference time by orchestrating multiple models and utilizing the increased test-time reasoning to boost performance. By having models themselves provide feedback and final judgments, ReFeR reduces the dependence on human evaluation. We rigorously evaluate ReFeR on four diverse evaluation benchmarks, where it surpasses prior methods in accuracy while also generating constructive feedback useful for downstream distillation and self-improvement via finetuning. Interestingly, ReFeR is also applicable for reasoning tasks - experiments on four reasoning benchmarks show ReFeR’s superior collective reasoning abilities. We present two variants of the framework: ReFeR-Turbo, optimized for accelerated performance, and ReFeR-Lite, offering a more test-time compute efficient solution. ReFeR-Lite is $\sim12-14\times$ more compute efficient than previous works while being comparably accurate to ReFeR-Turbo.

URL: https://openreview.net/forum?id=otSHFe8wTf

---

Title: A stochastic gradient descent algorithm with random search directions

Authors: Eméric Gbaguidi

Abstract: Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining coordinates. However, this approach is usually restricted to canonical basis vectors of $\mathbb{R}^d$. In this paper, we develop the class of stochastic gradient descent algorithms with random search directions. These methods use the directional derivative of the gradient estimate following more general random vectors. We establish the almost sure convergence of these algorithms with decreasing step. We further investigate their central limit theorem and pay particular attention to analyze the impact of the search distributions on the asymptotic covariance matrix. We also provide non-asymptotic $\mathbb{L}^p$ rates of convergence.

URL: https://openreview.net/forum?id=npER8AaLSb

---


New submissions
===============


Title: Eigenvector Phase Transitions under Anisotropic Noise

Abstract: Identifying latent structures in environmental data—such as habitat clusters or pollution sources—is a fundamental challenge in ecological and climate science. Spectral methods, which analyse the principal eigenvectors of affinity matrices, are powerful tools for this task. However, environmental systems are rarely isotropic; physical processes like river flows or prevailing winds create strong directional gradients, resulting in anisotropic noise. The effect of such anisotropy on the reliability of spectral methods is not yet well understood in the literature. In this work, we develop a rigorous theory for this scenario by analysing a spiked random matrix model subjected to anisotropic noise. We derive an exact, analytical expression for the critical signal-to-noise ratio required for signal detection, establishing a sharp phase transition. Our central result proves that this threshold depends critically on the geometric alignment between the signal and the dominant environmental gradient, formalising a ''camouflage effect''. We also uncover a critical failure mode where this environmental gradient can itself create a ''phantom'' structure that spectral methods can easily detect, posing a significant risk of misinterpretation for scientists. Furthermore, we show that in the detectable phase, the second eigenvector aligns with the primary noise direction, revealing a deeper reorganisation of the system's structure. We complete our analysis with a Central Limit Theorem for the alignment fluctuations. We validate our theoretical predictions with simulations of ecological systems, offering a fundamental understanding of when spectral methods succeed or fail in realistic environments. Code to reproduce all results in the paper is anonymously released at https://anonymous.4open.science/r/tmlr_ept

URL: https://openreview.net/forum?id=X1EqLH399m

---

Title: Concept Siever : Towards Controllable Erasure of Concepts from Diffusion Models without Side-effect

Abstract: Diffusion models' unprecedented success with image generation can largely be attributed to their large-scale pretraining on massive datasets. Yet, the necessity of forgetting specific concepts for regulatory or copyright compliance poses a critical challenge. Existing approaches in concept forgetting, although reasonably successful in forgetting a given concept, frequently fail to preserve generation quality or demand extensive domain expertise for preservation. To alleviate such issues, we introduce Concept Siever, an end-to-end framework for targeted concept removal within pre-trained text-to-image diffusion models. The foundation of Concept Siever rests on \textit{two key innovations}: First, an automatic technique to create paired dataset of target concept and its negations by utilizing the diffusion model’s latent space. A key property of these pairs is that they differ only in the target concept, enabling forgetting with \textit{minimal side effects} and \textit{without requiring domain expertise}. Second, we present Concept Sieve, a localization method for identifying and isolating the model components most responsible to the target concept. By retraining only these localized components on our paired dataset for a target concept, Concept Siever accurately removes the concept with \textit{negligible side-effects, preserving neighboring and unrelated concepts}. Moreover, given the subjective nature of forgetting a concept like nudity, we propose Concept Sieve which provides a \texit{fine-grained control over the forgetting strength at inference time}, catering to diverse deployment needs without any need of finetuning. We report state-of-the-art performance on the I2P benchmark, surpassing previous domain-agnostic methods by over $33\%$ while showing superior structure preservation. We validate our results through extensive quantitative and qualitative evaluation along with a user study.

URL: https://openreview.net/forum?id=O7zTvlSBZ9

---

Title: Tex4D: Zero-shot 4D Character Texturing with Video Diffusion Models

Abstract: 3D meshes are widely used in movies, games, AR, and VR for their efficiency in animation and minimal memory footprint, leading to the creation of a large number of mesh sequences. However, creating dynamic textures for these mesh sequences to model the appearance transformations remains labor-intensive for professional artists. In this work, we present Tex4D, a zero-shot approach that creates multi-view and temporally consistent dynamic mesh textures by integrating the inherent 3D geometry knowledge with the expressiveness of video diffusion models. Given an untextured mesh sequence and a text prompt as inputs, our method enhances multi-view consistency by synchronizing the diffusion process across different views through latent aggregation in the UV space. To ensure temporal consistency, such as lighting changes, wrinkles, and appearance transformations, we leverage prior knowledge from a conditional video generation model for texture synthesis. Using the video diffusion model and the UV texture aggregation in a straightforward way leads to blurred results. We analyze the underlying causes and propose a simple yet effective modification to the DDIM sampling process to address this issue. Additionally, we introduce a reference latent texture to strengthen the correlation between frames during the denoising process. To the best of our knowledge, Tex4D is the first method specifically designed for 4D character texturing. Extensive experiments demonstrate its superiority in producing multi-view and multi-frame consistent dynamic textures for mesh sequences.

URL: https://openreview.net/forum?id=y1EBrNkzDa

---

Title: CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection

Abstract: Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types (e.g., local and global defect), and the scarcity of training data. As such, it necessitates a comprehensive model capable of capturing both low-level and high-level features, even with limited data. To address this, we propose CLIPFUSION, a method that leverages both discriminative and generative foundation models. Given the CLIP-based discriminative model's limited capacity to capture fine-grained local details, we incorporate a diffusion-based generative model to complement its features. This integration yields a synergistic solution for anomaly detection. To this end, we propose using diffusion models as feature extractors for anomaly detection, and introduce carefully designed strategies to extract informative cross-attention and feature maps. Experimental results on benchmark datasets (MVTec-AD, VisA) demonstrate that CLIPFUSION consistently outperforms baseline methods in both anomaly segmentation and classification under both zero-shot and few-shot settings. We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection, providing a scalable solution for real-world applications.

URL: https://openreview.net/forum?id=WpFzZNuQmg

---

Title: Exploring exploration with foundation agents in interactive environments

Abstract: While foundation models have recently shown exemplary progress solving difficult single-turn math and reasoning problems, many human endeavors---from conducting scientific research to developing new technologies---require multi-turn exploration in dynamic interactive environments. Crucial components of learning from experience in these settings, such as efficiently gathering information to test hypotheses, meta-learning a model of the world's dynamics, and adapting to unexpected changes, remain largely unexplored for these models. We first evaluate foundation models in Feature World, a setting that primarily tests information gathering about a static hidden reward function. In this initial setting, we show that state-of-the-art foundation models come close to optimal efficiency in selecting maximally informative actions in tasks with simple reward functions, with more recent and thinking models performing especially well. As a proof of concept, we also show a model can gather information efficiently in a 3D embodied version of this task, though errors in vision limit some aspects of performance. In order to test exploration across multiple dependent turns and trials, we implement a custom, text-based version of the Alchemy environment, a benchmark designed for meta-learning. Here, agents must deduce a latent causal structure governing object interactions by integrating information gathered over a sequence of trials where actions modify the state relevant to future outcomes. In this more complex setting, we find that recent foundation models struggle to meta-learn strategies that enable improved performance over time. However, prompting the models to summarize their observations at regular intervals enables an emergent meta-learning process, allowing them to improve across trials. Notably, in some models, summarization also enabled adaptive re-learning of this information when the environment's rules change unexpectedly. While most models performed reasonably well on simple Feature World tasks, evaluations in Alchemy reveal stark differences in robustness among the models, with Gemini 2.5 performing best, followed by Claude 3.7, and ChatGPT-4o and o4-mini struggling the most. These results underscore Alchemy's value as a benchmark for meta-learning and strategy adaptation in foundation models. By moving beyond simple discovery to complex, stateful environments, we demonstrate that the most significant challenge for foundation agents is not selecting informative actions in the moment, but rather seeking and integrating knowledge through adaptive strategies over time. Intriguingly, we find there is likely no intrinsic barrier to future generations of foundation agents more fully mastering these abilities.

URL: https://openreview.net/forum?id=wOrkUTr0W5

---

Title: Bregman Centroid Guided Cross-Entropy Method

Abstract: The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose \textbf{$\mathcal B$regman-$\mathcal C$entroid Guided CEM ($\mathcal{BC}$-EvoCEM)}, a lightweight enhancement to ensemble CEM that leverages \emph{Bregman centroids} for principled information aggregation and diversity control. BC-EvoCEM computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that BC-EvoCEM integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, full MBRL pipelines, and a real-world quadruped robot demonstrate that BC-EvoCEM enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.

URL: https://openreview.net/forum?id=7949RzOul6

---

Title: Training-Conditional Coverage Bounds under Covariate Shift

Abstract: Conformal prediction methodology has recently been extended to the covariate shift setting, where the distribution of covariates differs between training and test data. While existing results ensure that the prediction sets from these methods achieve marginal coverage above a nominal level, their coverage rate conditional on the training dataset—referred to as training-conditional coverage—remains unexplored. In this paper, we address this gap by deriving upper bounds on the tail of the training-conditional coverage distribution, offering probably approximately correct (PAC) guarantees for these methods. Our results characterize the reliability of the prediction sets in terms of the severity of distributional changes and the size of the training dataset.

URL: https://openreview.net/forum?id=F6hHT3qWxT

---

Title: Augmented Vision-Language Models: A Systematic Review

Abstract: Recent advances in visual-language machine learning models have demonstrated exceptional ability to use natural language and understand visual scenes by training on large, unstructured datasets. However, this training paradigm cannot produce interpretable explanations for its outputs, requires retraining to integrate new information, is highly resource-intensive, and struggles with certain forms of logical reasoning. One promising solution involves integrating neural networks with external symbolic information systems, forming neural symbolic systems that can enhance reasoning and memory abilities. These neural symbolic systems provide more interpretable explanations to their outputs and the capacity to assimilate new information without extensive retraining. Utilizing powerful pre-trained Vision-Language Models (VLMs) as the core neural component, augmented by external systems, offers a pragmatic approach to realizing the benefits of neural-symbolic integration. This systematic literature review aims to categorize techniques through which visual-language understanding can be improved by interacting with external symbolic information systems.

URL: https://openreview.net/forum?id=DFnPi77v6J

---

Reply all
Reply to author
Forward
0 new messages