Daily TMLR digest for Jan 28, 2026

1 view
Skip to first unread message

TMLR

unread,
Jan 28, 2026, 12:30:10 AMJan 28
to tmlr-anno...@googlegroups.com


New certifications
==================

Survey Certification: A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Qihan Ren, Yiran Wu, Hongru WANG, Han Xiao, Yuhang Zhou, Shaokun Zhang, Jiayi Zhang, Jinyu Xiang, Yixiong Fang, Qiwen Zhao, Dongrui Liu, Cheng Qian, Zhenhailong Wang, Minda Hu, Huazheng Wang, Qingyun Wu, Heng Ji, Mengdi Wang

https://openreview.net/forum?id=CTr3bovS5F

---


Accepted papers
===============


Title: A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Authors: Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Qihan Ren, Yiran Wu, Hongru WANG, Han Xiao, Yuhang Zhou, Shaokun Zhang, Jiayi Zhang, Jinyu Xiang, Yixiong Fang, Qiwen Zhao, Dongrui Liu, Cheng Qian, Zhenhailong Wang, Minda Hu, Huazheng Wang, Qingyun Wu, Heng Ji, Mengdi Wang

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift ---from scaling static models to developing self-evolving agents --- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organizing the field around three foundational dimensions --- what to evolve, when to evolve, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing more adaptive, capable, robust, and versatile agentic systems in both research and real-world deployments, and ultimately sheds light on the realization of Artificial Super Intelligence (ASI) where agents evolve autonomously and perform beyond human-level intelligence across a wide array of tasks.

URL: https://openreview.net/forum?id=CTr3bovS5F

---

Title: On the (linear) convergence of Generalized Newton Inexact ADMM

Authors: Zachary Frangella, Theo Diamandis, Bartolomeo Stellato, Madeleine Udell

Abstract: This paper presents GeNI-ADMM, a framework for large-scale composite convex optimization that facilitates theoretical analysis of both existing and new approximate ADMM schemes. GeNI-ADMM encompasses any ADMM algorithm that solves a first- or second-order approximation to the ADMM subproblem inexactly. GeNI-ADMM exhibits the usual O(1/t)-convergence rate under standard hypotheses and converges linearly under additional hypotheses such as strong convexity. Further, the GeNI-ADMM framework provides explicit convergence rates for ADMM variants accelerated with randomized linear algebra, such as NysADMM and sketch-and-solve ADMM, resolving an important open question on the convergence of these methods. This analysis quantifies the benefit of improved approximations and can aid in the design of new ADMM variants with faster convergence.

URL: https://openreview.net/forum?id=GT3naIXBxK

---

Title: Video Prediction Transformers without Recurrence or Convolution

Authors: Yujin Tang, Lu Qi, Xiangtai Li, Chao Ma, Ming-Hsuan Yang

Abstract: Video prediction has witnessed the emergence of RNN-based models led by ConvLSTM, and CNN-based models led by SimVP. Following the significant success of ViT, recent works have integrated ViT into both RNN and CNN frameworks, achieving improved performance. While we appreciate these prior approaches, we raise a fundamental question: Is there a simpler yet more effective solution that can eliminate the high computational cost of RNNs while addressing the limited receptive fields and poor generalization of CNNs? How far can it go with a simple pure transformer model for video prediction? In this paper, we propose PredFormer, a framework entirely based on Gated Transformers. We provide a comprehensive analysis of 3D Attention in the context of video prediction. Extensive experiments demonstrate that PredFormer delivers state-of-the-art performance across four standard benchmarks. The significant improvements in both accuracy and efficiency highlight the potential of PredFormer as a strong baseline for real-world video prediction applications. The source code and trained models will be released to the public.

URL: https://openreview.net/forum?id=Afvhu9Id8m

---

Title: Prescribe-then-Select: Adaptive Policy Selection for Contextual Stochastic Optimization

Authors: Caio de Próspero Iglesias, Kimberly Villalobos Carballo, Dimitris Bertsimas

Abstract: We address the problem of policy selection in contextual stochastic optimization (CSO), where covariates are available as contextual information and decisions must satisfy hard feasibility constraints. In many CSO settings, multiple candidate policies—arising from different modeling paradigms—exhibit heterogeneous performance across the covariate space, with no single policy uniformly dominating. We propose Prescribe-then-Select (PS), a modular framework that first constructs a library of feasible candidate policies and then learns a meta-policy to select the best policy for the observed covariates. We implement the meta-policy using ensembles of Optimal Policy Trees trained via cross-validation on the training set, making policy choice entirely data-driven. Across two benchmark CSO problems—single-stage newsvendor and two-stage shipment planning—PS consistently outperforms the best single policy in heterogeneous regimes of the covariate space and converges to the dominant policy when such heterogeneity is absent. All the code to reproduce the results can be found at https://anonymous.4open.science/r/Prescribe-then-Select-TMLR.

URL: https://openreview.net/forum?id=lFEsAF2I7C

---

Title: Language Models are Symbolic Learners in Arithmetic

Authors: Chunyuan Deng, Zhiqi Li, Roy Xie, Ruidi Chang, Hanjie Chen

Abstract: The prevailing question in LM performing arithmetic is whether these models learn to truly compute or if they simply master superficial pattern matching. In this paper, we argues for the latter, presenting evidence that LMs act as greedy symbolic learners, prioritizing the simplest possible shortcuts to fit the stats of dataset to solve arithmetic tasks. To investigate this, we introduce \textbf{subgroup induction}, a practical framework adapted from Solomonoff Induction (SI), one of the most powerful universal predictors. Our framework analyzes arithmetic problems by breaking them down into ``subgroups''—minimal mappings between a few input digits and a single output digit. Our primary metric, subgroup quality, measures the viability of these shortcuts. Experiments reveal a distinct U-shaped accuracy pattern in multi-digit multiplication: LMs quickly master the first and last output digits while struggling with those in the middle. We demonstrate this U-shape is not coincidental; it perfectly mirrors the quality of the simplest possible subgroups, those requiring the fewest input tokens. This alignment suggests a core learning mechanism: LMs first learn easy, low-token shortcuts and only incorporate more complex, multi-token patterns as training progresses. They do not learn the algorithm of multiplication but rather a hierarchy of increasingly complex symbol-to-symbol mappings. Ultimately, our findings suggest that the path to arithmetic mastery for LMs is not paved with algorithms, but with a cascade of simple, hierarchically-learned symbolic shortcuts. The code is at https://github.com/chili-lab/Symbolic-Arithmetic.

URL: https://openreview.net/forum?id=QSblPg1xUM

---

Title: High-Layer Attention Pruning with Rescaling

Authors: Songtao Liu, Peng Liu

Abstract: Pruning is a highly effective approach for compressing large language models (LLMs), significantly reducing inference latency. However, conventional training-free structured pruning methods often employ a heuristic metric that indiscriminately removes some attention heads across all pruning layers, without considering their positions within the network architecture. In this work, we propose a novel pruning algorithm that strategically prunes attention heads in the model's higher layers. Since the removal of attention heads can alter the magnitude of token representations, we introduce an adaptive rescaling parameter that calibrates the representation scale post-pruning to counteract this effect. We conduct comprehensive experiments on a wide range of LLMs, including LLaMA3.1-8B, Mistral-7B-v0.3, Qwen2-7B, and Gemma2-9B. Our evaluation includes both generation and discriminative tasks across 27 datasets. The results consistently demonstrate that our method outperforms existing structured pruning methods. This improvement is particularly notable in generation tasks, where our approach significantly outperforms existing baselines. Code is available at \url{https://github.com/SongtaoLiu0823/HARP}.

URL: https://openreview.net/forum?id=jkPBIxYmWE

---

Title: Still Competitive: Revisiting Recurrent Models for Irregular Time Series Prediction

Authors: Ankitkumar Joshi, Milos Hauskrecht

Abstract: Modeling irregularly sampled multivariate time series is a persistent challenge in domains like healthcare and sensor networks. While recent works have explored a variety of complex learning architectures to solve the prediction problems for irregularly sampled time series, it remains unclear what the true benefits of some of these architectures are, and whether clever modifications of simpler and more efficient RNN-based algorithms are still competitive, i.e. they are on par with or even superior to these methods. In this work, we propose and study GRUwE: Gated Recurrent Unit with Exponential basis functions, that builds upon RNN-based architectures for observations made at irregular times. GRUwE supports both regression-based and event-based predictions in continuous time. GRUwE works by maintaining a Markov state representation of the time series that updates with the arrival of irregular observations. The Markov state update relies on two reset mechanisms: (i) observation-triggered reset to account for the new observation, and (ii) time-triggered reset that relies on learnable exponential decays, to support the predictions in continuous time. Our empirical evaluations across several real-world benchmarks on next-observation and next-event prediction tasks demonstrate that GRUwE can indeed achieve competitive or superior performance compared to the recent state-of-the-art (SOTA) methods. Thanks to its simplicity, GRUwE offers compelling advantages: it is easy to implement, requires minimal hyper-parameter tuning efforts, and significantly reduces the computational overhead in the online deployment.

URL: https://openreview.net/forum?id=YLoZA77QzR

---


New submissions
===============


Title: Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment

Abstract: Large Language Models (LLMs) enable advanced natural language processing but face deploy-
ment challenges on resource-constrained edge devices due to high computational, memory,
and energy demands. Optimizing these models requires addressing three key challenges: ac-
quiring task-specific data, fine-tuning for performance, and compressing models to accelerate
inference while reducing resource demands. We propose an integrated framework combining GPTQ-based
quantization, low-rank adaptation (LoRA), and a specialized data distillation process to
significantly reduce model size and complexity while preserving or enhancing task-specific
performance. By leveraging data distillation, knowledge distillation via Kullback-Leibler
divergence, Bayesian hyperparameter optimization, and the Muon optimizer, we
achieve up to 2× memory compression (e.g., reducing a 6GB model to 3GB) and enables
efficient inference for specialized tasks. Empirical results demonstrate the superior
performance on standard LLM benchmarks compared to GPTQ quantization alone, with the
Muon optimizer notably enhancing fine-tuned models’ resistance to accuracy decay during
quantization.

URL: https://openreview.net/forum?id=tuY7MoLyDG

---

Title: Optimal Pattern Detection Tree for Symbolic Rule-Based Classification

Abstract: Pattern discovery in data plays a crucial role across diverse domains, including healthcare, risk assessment, and machinery maintenance. In contrast to black-box deep learning models, symbolic rule discovery emerges as a key data mining task, generating human-interpretable rules that offer both transparency and intuitive explainability. This paper introduces the optimal pattern detection tree (OPDT) for binary classification, a rule-based machine learning model based on novel mixed integer programming to extract an optimal pattern in data. This optimization-based approach discovers a hidden underlying pattern in datasets, when it exists, by identifying an optimal rule that maximizes coverage while minimizing the false positive rate due to misclassification. Our computational experiments show that OPDT discovers a pattern with optimality guarantees on moderately sized datasets within reasonable runtime.

URL: https://openreview.net/forum?id=RJ6eMDcDCv

---

Title: CF-HPO: Counterfactual Explanations for Hyperparameter Optimization

Abstract: Hyperparameter optimization (HPO) is a fundamental component of studies that use tech-
technologies such as machine learning and deep learning. Regardless of the field, almost every
study requires hyperparameter optimization at some level. In general, applying HPO to a
developed system improves its performance by optimizing multiple parameters. However,
extant HPO methods do not provide information on why specific configurations are successful, what should not be done, or what could be improved. The present study proposes
a novel approach to address this gap in the literature by introducing CF-HPO, a modular
framework that generates counterfactual explanations for HPO results. CF-HPO answers
questions such as “what potential improvements could be made,” “what settings should be
avoided,” and “what-if analysis.” These outputs can serve as a guide, especially for those who
are not optimization experts. The recommended system has a modular design that supports
different search strategies (UCB-driven, random, restart). This allows it to perform well in
optimization and also to show counterfactual explanations at the end of optimization. Experiments conducted on the YAHPO benchmark package yielded validation rates of 92.2%
for neural networks and 60.4% for random forests. These findings reveal that counterfactual
generability depends on the geometry of the performance surface rather than dimensionality.

URL: https://openreview.net/forum?id=f4eQmsYiNN

---

Reply all
Reply to author
Forward
0 new messages