Daily TMLR digest for Jan 16, 2026

0 views

Skip to first unread message

TMLR

unread,

Jan 16, 2026, 12:30:08 AMJan 16

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Generalization Bound for a Shallow Transformer Trained Using Gradient Descent

Authors: Brian Mwigo, Anirban Dasgupta

Abstract: In this work, we establish a norm-based generalization bound for a shallow Transformer model trained via gradient descent under the bounded-drift (lazy training) regime, where model parameters remain close to their initialization throughout training. Our analysis proceeds in three stages: (a) we formally define a hypothesis class of Transformer models constrained to remain within a small neighborhood of their initialization; (b) we derive an upper bound on the Rademacher complexity of this class, quantifying its effective capacity; and (c) we establish an upper bound on the empirical loss achieved by gradient descent under suitable assumptions on model width, learning rate, and data structure. Combining these results, we obtain a high-probability bound on the true loss that decays sublinearly with the number of training samples $N$ and depends explicitly on model and data parameters. The resulting bound demonstrates that, in the lazy regime, wide and shallow Transformers generalize similarly to their linearized (NTK) counterparts. Empirical evaluations on both text and image datasets support the theoretical findings.

URL: https://openreview.net/forum?id=t3iUeMOT8Z

---

Title: Mechanism-Aware Prediction of Tissue-Specific Drug Activity via Multi-Modal Biological Graphs

Authors: Sally Turutov, Kira Radinsky

Abstract: Predicting how small molecules behave across human tissues is essential for targeted therapy development. While some existing models incorporate tissue identity, they treat it as a label—ignoring the underlying biological mechanisms that differentiate tissues. We present Expresso, a multi-modal architecture that predicts tissue-contextual molecular activity as measured by the assay by modeling how compounds interact with transcriptomic and pathway-level tissue context. Expresso constructs heterogeneous graphs from GTEx data, linking samples, genes, and pathways to reflect expression profiles and curated biological relationships. These graphs are encoded using a hierarchical GNN and fused with frozen molecular embeddings to produce context-aware predictions. A multi-task pretraining strategy—spanning gene recovery, tissue classification, and pathway-level contrastive learning—guides the model to learn mechanistically grounded representations. On nine tissues, Expresso improves mean AUC by up to 27.9 points over molecule-only baselines. Our results demonstrate that incorporating biological structure—as defined by the assay—yields more accurate and interpretable models for tissue-specific drug behavior in human cell-based in vitro assay systems.

URL: https://openreview.net/forum?id=UDW8m9iQeC

---

Title: AC$\oplus$DC search: behind the winning solution to the FlyWire graph-matching challenge

Authors: Daniel Lee, Arie Matsliah, Lawrence K. Saul

Abstract: This paper describes the Alternating Continuous and Discrete Combinatorial (AC$\oplus$DC) optimizations behind the winning solution to the FlyWire Ventral Nerve Cord Matching Challenge. The challenge was organized by the Princeton Neuroscience Institute and held over three months, ending on January 31, 2025. During this period, the challenge attracted teams of researchers with expertise in machine learning, high-performance computing, graph data mining, biological network analysis, and quadratic assignment problems. The goal of the challenge was to align the connectomes of a male and female fruit fly, and more specifically, to determine a one-to-one correspondence between the neurons in their ventral nerve cords. The connectomes were represented as sparse weighted graphs with thousands of nodes and millions of edges, and the challenge was to find the permutation that best maps the nodes and edges of one graph onto those of the other. The winning solution to the challenge alternated between two complementary approaches to graph matching---the first, a combinatorial optimization over the symmetric group of permutations, and the second, a continuous relaxation of this problem to the space of doubly stochastic matrices. For the latter, the doubly stochastic matrices were optimized by combining Frank-Wolfe methods with a fast preconditioner to solve the linear assignment problem at each iteration. We provide a complete implementation of these methods with a few hundred lines of code in MATLAB. Notably, this implementation obtains a winning score to the challenge in less than 10 minutes on a laptop computer.

URL: https://openreview.net/forum?id=8MjCOMyaDf

---

Title: VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

Authors: Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, Stanley Osher

Abstract: In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose \textit{Vision In-Context Operator Networks} (VICON), which integrate vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multi-physics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines DPOT and MPP, reducing the average last-step rollout error by 37.9\% compared to DPOT and 44.7\% compared to MPP, while requiring only 72.5\% and 34.8\% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in \textit{imperfect measurement systems} where sampling frequencies may differ or frames might be dropped—common challenges in real-world settings—without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41\% relative performance degradation compared to 71.37\%-74.49\% degradation in baseline methods, demonstrating its versatility for deployment in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.

URL: https://openreview.net/forum?id=6V3YmHULQ3

---

Title: The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs

Authors: Jierun Chen, Tiezheng YU, Haoli Bai, Lewei Yao, Jiannan Wu, Kaican Li, Fei Mi, Chaofan Tao, Lei Zhu, Manyi Zhang, Xiao-Hui Li, Lu Hou, Lifeng Shang, Qun Liu

Abstract: Large vision-language models (VLMs) increasingly adopt post-training techniques such as long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL) to elicit sophisticated reasoning. While these methods exhibit synergy in language-only models, their joint effectiveness in VLMs remains uncertain. We present a systematic investigation into the distinct roles and interplay of long-CoT SFT and RL across multiple multimodal reasoning benchmarks. We find that SFT improves performance on difficult questions by in-depth, structured reasoning, but introduces verbosity and degrades performance on simpler ones. In contrast, RL promotes generalization and brevity, yielding consistent improvements across all difficulty levels, though the improvements on the hardest questions are less prominent compared to SFT. Surprisingly, combining them through two-staged, interleaved, or progressive training strategies, as well as data mixing and model merging, all fails to produce additive benefits, instead leading to trade-offs in accuracy, reasoning style, and response length. This "synergy dilemma" highlights the need for more seamless and adaptive approaches to unlock the full potential of combined post-training techniques for reasoning VLMs. Code, dataset, and fine-tuned models are available at https://github.com/JierunChen/SFT-RL-SynergyDilemma.

URL: https://openreview.net/forum?id=XPML8UGI04

---

Title: Learning object representations through amortized inference over probabilistic programs

Authors: Francisco Silva, Hélder P. Oliveira, Tania Pereira

Abstract: The recent developments of modern probabilistic programming languages have enabled the combination of pattern recognition engines implemented by neural networks to guide inference over explanatory factors written as symbols in probabilistic programs. We argue that learning to invert fixed generative programs, instead of learned ones, places stronger restrictions on the representations learned by feature extraction networks, which reduces the space of latent hypotheses and enhances training efficiency. To empirically demonstrate this, we investigate a neurosymbolic object-centric representation learning approach that combines a slot-based neural module optimized via inference compilation to invert a prior generative program of scene generation. By amortizing the search over posterior hypotheses, we demonstrate that approximate inference using data-driven sequential Monte Carlo methods achieves competitive results when compared to state-of-the-art fully neural baselines while requiring several times fewer training steps.

URL: https://openreview.net/forum?id=nUFSrlJaUr

---

New submissions
===============

Title: Out-of-distribution generalization of deep-learning surrogates for 2D PDE-generated dynamics in the small-data regime

Abstract: Partial differential equations (PDEs) are a central tool for modeling the dynamics of physical, engineering, and materials systems, but high-fidelity simulations are often computationally expensive. At the same time, many scientific applications can be viewed as the evolution of spatially distributed fields, making data-driven forecasting of such fields a core task in scientific machine learning. In this work we study autoregressive deep-learning surrogates for two-dimensional PDE dynamics on periodic domains, focusing on generalization to out-of-distribution initial conditions within a fixed PDE and parameter regime and on strict small-data settings with at most $\mathcal{O}(10^2)$ simulated trajectories per system. We introduce a multi-channel U-Net with enforced periodic padding (me-UNet) that takes short sequences of past solution fields of a single representative scalar variable as input and predicts the next time increment. We evaluate me-UNet on five qualitatively different PDE families--- linear advection, diffusion, continuum dislocation dynamics, Kolmogorov flow, and Gray--Scott reaction--diffusion---and compare it to ViT, AFNO, PDE-Transformer, and KAN-UNet under a common training setup. Across all datasets, me-UNet matches or outperforms these more complex architectures in terms of field-space error, spectral similarity, and physics-based metrics for in-distribution rollouts, while requiring substantially less training time. It also generalizes qualitatively to unseen initial conditions and, e.g., reaches comparable performance on continuum dislocation dynamics with as few as $\approx 20$ training simulations. A data-efficiency study and Grad-CAM analysis further suggest that, in small-data periodic 2D PDE settings, convolutional architectures with inductive biases aligned to locality and periodic boundary conditions remain strong contenders for accurate and moderately out-of-distribution-robust surrogate modeling.

URL: https://openreview.net/forum?id=TyW6Ar3wcD

---

Title: Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention

Abstract: We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: \emph{abstention}. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to {\em abstain} from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fixed regret or gains a guaranteed reward. This added layer of complexity naturally prompts the key question: can we develop algorithms that are both computationally efficient and asymptotically and minimax optimal in this setting? We answer this question in the affirmative by designing and analyzing algorithms whose regrets meet their corresponding information-theoretic lower bounds. Our results offer valuable quantitative insights into the benefits of the abstention option, laying the groundwork for further exploration in other online decision-making problems with such an option. Extensive numerical experiments validate our theoretical results, demonstrating that our approach not only advances theory but also has the potential to deliver significant practical benefits.

URL: https://openreview.net/forum?id=AYp5zOcFdA

---

Title: From SQL to Knowledge Graphs: An LLM-Driven MultiAgent Approach with Data Schema Improvement

Abstract: RDBMS (Relational Database Management System) databases face several limitations, including slow execution with multi-hop queries and a lack of explainability through graphical interpretations. In contrast, graph databases offer a more intuitive and efficient data schema that enables faster execution on large datasets. Most existing RDBMS conversion pipelines focus on running traditional loading commands and relying on Cypher queries. However, the efficiency of using an LLM to generate an effective graph data schema, significantly reducing the ambiguity of the graph database, remains underexplored in the current research literature. This paper presents a novel algorithm that bridges RDBMS and graph databases by using an LLM-powered ETL agent to standardize table and column names before saving them to the Data Mart. A Multi-Agent System generates a looping discussion between ETL, Analyzer, and Graph agents to optimize the final design through an iterative process of suggesting and scoring the graph database schema. We ensure that the final graph database meets three criteria before being accepted for data conversion: Accuracy, Groundedness, and Faithfulness. This system demonstrates an effective pipeline to automatically convert a tabular database into a graph database through a comprehensive end-to-end process. Our study highlights notable efficiency gains from using the converted graph database, evaluated on 1,081 samples of a BFSI dataset across three levels of complexity (easy, medium, and hard). Specifically, CypherAgent achieves an 85.6% accuracy for Q&A tasks using the graph database, which is 12.12% higher than the accuracy achieved by an SQLAgent on the PostgreSQL RDBMS across all queries. Additionally, the graph database demonstrates faster performance, reducing latency by approximately three times. For easy, medium, and hard queries, the graph database attains accuracies of 90.43%, 81.98%, and 80.06%, respectively, surpassing the RDBMS by 17.8%, 4.2%, and 11.0%.

URL: https://openreview.net/forum?id=HYu0dGmj5x

---

Title: The 2025 Foundation Model Transparency Index

Abstract: Foundation model developers are among the world’s most important companies. As these companies become increasingly consequential, how do their transparency practices evolve? The 2025 Foundation Model Transparency Index is the third edition of an annual effort to characterize and quantify the transparency of foundation model developers. The 2025 FMTI introduces new indicators related to data acquisition, usage data, and monitoring and evaluates companies like Alibaba, DeepSeek, and xAI for the first time. The 2024 FMTI reported that transparency was improving, but the 2025 FMTI finds this progress has deteriorated: the average score out of 100 fell from 58 in 2024 to 40 in 2025. Companies are most opaque about their training data and training compute as well as the post-deployment usage and impact of their flagship models. While companies tend to disclose evaluations of model capabilities and risks, limited methodological transparency, third-party involvement, reproducibility, and reporting of train-test overlap pose challenges. In spite of this general trend, IBM stands out as a positive outlier, scoring 95, in contrast to the lowest scorers, xAI and Midjourney, at just 14. Several groups of companies score higher than the mean: open model developers, enterprise-focused B2B companies, companies that prepare their own transparency reports, and signatories to the EU AI Act General Purpose-AI Code of Practice. The five members of the Frontier Model Forum we score end up in the middle of the Index: we posit that major companies aim to avoid particularly low rankings but also lack incentives to be highly transparent. As policymakers around the world increasingly mandate certain types of transparency, this work reveals the current state of transparency for foundation model developers, how it may change given newly enacted policy, and where more aggressive policy interventions are necessary to address critical information deficits.

URL: https://openreview.net/forum?id=1jT253Xtyf

---

Title: Hierarchical Geometry of Cognitive States in Transformer Embedding Spaces

Abstract: Recent work has shown that transformer-based language models learn rich geometric structure in their embedding spaces, yet the presence of higher-level cognitive organization within these representations remains underexplored. In this work, we investigate whether sentence embeddings encode a graded, hierarchical structure aligned with human-interpretable cognitive or psychological attributes. We construct a dataset of 480 natural-language sentences annotated with both continuous energy scores (ranging from -5 to 5) and discrete tier labels spanning seven ordered consciousness-related cognitive categories. Using fixed sentence embeddings from multiple transformer models, we evaluate the recoverability of these annotations via linear and shallow nonlinear probes. Across models, both continuous energy scores and tier labels are reliably decodable by both linear and nonlinear probes, with nonlinear probes outperforming linear counterparts. To assess statistical significance, we conduct nonparametric permutation tests that randomize labels while preserving embedding geometry, finding that observed probe performance significantly exceeds chance under both regression and classification null hypotheses (p < 0.005). Qualitative analyses using UMAP visualizations and tier-level confusion matrices are consistent with these findings, illustrating a coherent low-to-high gradient and predominantly local (adjacent-tier) confusions in embedding space. Taken together, these results provide evidence that transformer embedding spaces exhibit a hierarchical geometric organization statistically aligned with our human-defined cognitive structure; while this work does not claim internal awareness or phenomenology, it demonstrates a systematic alignment between learned representation geometry and interpretable cognitive and psychological attributes, with potential implications for representation analysis, safety modeling, and geometry-based generation steering.

URL: https://openreview.net/forum?id=qKKqZAOJig

---

Title: TCSurv: Time-based Clustering for Reliable Survival Analysis

Abstract: Survival analysis is critical in healthcare for predicting time-to-event outcomes such as disease progression or patient survival. While deep learning excels at capturing meaningful representations from complex clinical data and has improved performance in deep survival models, it inherently struggles with reliability and robustness, challenges that are especially significant when deploying these models in real-world clinical practice. Out-of-distribution (OOD) detection, designed to identify or flag samples that deviate from the training distribution, has become a key method for evaluating AI reliability across fields. This capability is especially important in clinical applications, where noisy or heterogeneous patient data can lead to incorrect assessments; yet, OOD detection remains underexplored and challenging in deep survival analysis due to the need to handle both censored and observed samples, which are unique to this domain. In this study, we address this critical gap by introducing TCSurv, a novel time-base clustering approach for survival analysis that handles both observed and censored samples for robust OOD detection. TCSurv initializes cluster centers using in-distribution data, creating time-specific clusters that anchor model predictions for both observed and censored samples. Experiments in real-world clinical data, including Alzheimer’s dementia progression, and benchmark medical imaging datasets demonstrate that TCSurv effectively distinguishes OOD samples without compromising survival performance compared to existing deep survival analysis frameworks.

URL: https://openreview.net/forum?id=d2zkcIC69b

---

Title: Temporal Variational Implicit Neural Representations

Abstract: We introduce Temporal Variational Implicit Neural Representations (TV-INRs), a probabilistic framework for modeling irregular multivariate time series that enables efficient and accurate individualized imputation and forecasting. By integrating implicit neural representations with latent variable models, TV-INRs learn distributions over time-continuous generator functions conditioned on signal-specific covariates.
Unlike existing approaches that require extensive training, fine-tuning or meta-learning, our method achieves accurate individualized predictions through a single forward pass. Our experiments demonstrate that with a single TV-INRs instance, we can accurately solve diverse imputation and forecasting tasks, offering a computationally efficient and scalable solution for real-world applications.
TV-INRs performs particularly well in low-data regimes, where on several datasets it achieves substantially lower imputation error, including order-of-magnitude improvements.

URL: https://openreview.net/forum?id=1CGfvw4ySe

---

Title: Automatic Selection of the Nugget for Linear System Solves in Machine Learning

Abstract: Rapid prototyping of algorithms is a critical step in modern machine learning. Most algorithms exploit linear algebra, creating a need for lightweight numerical routines which -- while potentially sub-optimal for the task at hand -- can be rapidly implemented. For the numerical solution of ill-conditioned linear systems of equations, the standard solution for prototyping is Tikhonov-regularised inversion using a nugget. However, selection of the size of nugget is often difficult, and the use of data-adaptive procedures precludes automatic differentiation, introducing instabilities into end-to-end training. Further, while data-adaptive procedures perform multiple linear solves to select the size of nugget, only the result of one such solve is returned, which we argue is wasteful. This paper aims to resolve the above difficulties, presenting `autonugget`; a `Python` package for automatic and stable numerical solution of linear systems suitable for rapid prototyping, and fully compatible with automatic differentiation using `JAX`. A distinguishing feature of `autonugget` is the ability to combine multiple linear solves using Richardson extrapolation, improving in accuracy over approximations based on a single nugget.

URL: https://openreview.net/forum?id=fqbkenUpRa

---

Reply all

Reply to author

Forward

0 new messages