Daily TMLR digest for Nov 28, 2025

0 views

Skip to first unread message

TMLR

unread,

Nov 28, 2025, 12:30:07 AMNov 28

to tmlr-anno...@googlegroups.com

New certifications
==================

Featured Certification: FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Lingjiao Chen, Matei Zaharia, James Zou

https://openreview.net/forum?id=cSimKw5p6R

---

Accepted papers
===============

Title: Triple Preference Optimization: Achieving Better Alignment using a Single Step Optimization

Authors: Amir Saeidi, Shivanshu Verma, Kashif Rasul, Aswin RRV, Chitta Baral

Abstract: Reinforcement Learning with Human Feedback (RLHF) enhances the alignment of Large Language Models (LLMs). However, its limitations have led to the development of Direct Preference Optimization (DPO), an RL-free approach designed to overcome these shortcomings. While studies have shown that DPO improves instruction-following capabilities, it negatively impacts the reasoning ability of LLMs. Additionally, DPO is highly sensitive to judgment noise in preference datasets and the size of the training set. Although several modifications to DPO have been proposed, they still fail to fully resolve these issues. To address these limitations, we propose Triple Preference Optimization (TPO), a new preference learning method designed to enhance both reasoning and instruction-following abilities through one-step optimization. We compare TPO against DPO and its recent variants using state-of-the-art training setups, including both base and instruction-tuned models such as Mistral and Llama 3. Our evaluation covers a comprehensive range of chat-based and reasoning benchmarks. The results demonstrate that TPO achieves significant improvements over existing methods without substantially increasing response length across different dataset sizes. Specifically, TPO outperforms DPO and SimPO by up to 7.0% and 7.3% points on Arena-Hard, 12.2% and 13.3% points on MixEval-Hard, 10.4% and 10.1% points on MMLU-Pro, and 19.0% and 19.2% points on GSM8K, respectively. Furthermore, TPO achieves these improvements while requiring less data than DPO.

URL: https://openreview.net/forum?id=A4jyaZheE8

---

Title: Estimating the Event-Related Potential from Few EEG Trials

Authors: Anders Vestergaard Nørskov, Kasper Jørgensen, Alexander Neergaard Zahid, Morten Mørup

Abstract: Event-related potentials (ERP) are measurements of brain activity with wide applications in basic and clinical neuroscience, that are typically estimated using the average of many trials of electroencephalography signals (EEG) to sufficiently reduce noise and signal variability.
We introduce EEG2ERP, a novel uncertainty-aware autoencoder approach that maps an arbitrary number of EEG trials to their associated ERP. To account for the ERP uncertainty we use bootstrapped training targets and introduce a separate variance decoder to model the uncertainty of the estimated ERP.
We evaluate our approach in the challenging zero-shot scenario of generalizing to new subjects considering three different publicly available data sources; i) the comprehensive ERP CORE dataset that includes over 50,000 EEG trials across six ERP paradigms from 40 subjects, ii) the large P300 Speller BCI dataset, and iii) a neuroimaging dataset on face perception consisting of both EEG and magnetoencephalography (MEG) data. We consistently find that our method in the few trial regime provides substantially better ERP estimates than commonly used conventional and robust averaging procedures.
EEG2ERP is the first deep learning approach to map EEG signals to their associated ERP, moving toward reducing the number of trials necessary for ERP research. Code is available at https://github.com/andersxa/EEG2ERP.

URL: https://openreview.net/forum?id=c6LgqDhpH0

---

New submissions
===============

Title: InfoScale: Unleashing Training-free Variable-scaled Image Generation via Effective Utilization of Information

Abstract: Diffusion models (DMs) have become dominant in visual generation but suffer a performance drop when tested on resolutions that differ from the training scale, whether lower or higher.
Current training-free methods for DMs have shown promising results, primarily focusing on higher-resolution generation. However, most methods lack a unified analytical perspective for variable-scale generation, leading to suboptimal results.
In fact, the key challenge in generating variable-scale images lies in the differing amounts of information across resolutions, which requires information conversion procedures to be varied for generating variable-scaled images.
In this paper, we investigate the issues of three critical aspects in DMs for a unified analysis in variable-scaled generation: dilated convolution, attention mechanisms, and initial noise.
Specifically, 1) dilated convolution in DMs for the higher-resolution generation loses high-frequency information.
2) Attention for variable-scaled image generation struggles to adjust the information aggregation adaptively.
3) The spatial distribution of information in the initial noise is misaligned with the variable-scaled image.
To solve the above problems, we propose $\textbf{InfoScale}$, an information-centric framework for variable-scaled image generation by effectively utilizing information from three aspects correspondingly.
For information loss in 1), we introduce a Progressive Frequency Compensation module to compensate for high-frequency information lost by dilated convolution in higher-resolution generation.
For information aggregation inflexibility in 2), we introduce an Adaptive Information Aggregation module to adaptively aggregate information in lower-resolution generation and achieve an effective balance between local and global information in higher-resolution generation.
For information distribution misalignment in 3), we design a Noise Adaptation module to re-distribute information in initial noise for variable-scaled generation.
Our method is plug-and-play, and extensive experiments demonstrate its effectiveness in variable-scaled image generation.

URL: https://openreview.net/forum?id=iB8PzxdpMf

---

Title: Nondeterministic Polynomial-time Problem Challenge: An Ever-Scaling Reasoning Benchmark for LLMs

Abstract: Reasoning is the fundamental capability of large language models (LLMs). Due to the rapid progress of LLMs, there are two main issues of current benchmarks: i) these benchmarks can be crushed in a short time (less than 1 year), and ii) these benchmarks may be easily hacked. To handle these issues, we propose the ever-scalingness for building the benchmarks which are scaling over complexity, instance, oversight and coverage. This paper presents Nondeterministic Polynomial-time Problem Challenge (NPPC) , an ever-scaling reasoning benchmark for LLMs. Specifically, the NPPC has three main modules: i) npgym, which provides a unified interface of 25 well-known NP-complete problems and can generate any number of instances with any levels of complexities, ii) npsolver, which provides a unified interface to evaluate the problem instances with both online and offline models via APIs and local deployments, respectively, and iii) npeval, which provides the comprehensive and ready-to-use tools to analyze the performances of LLMs over different problems, the number of tokens, the aha moments, the reasoning errors and the solution errors. Extensive experiments over widely-used LLMs demonstrate: i) NPPC can successfully decrease the performances of advanced LLMs to below 10%, demonstrating that NPPC is not crushed by current models, ii) DeepSeek-R1, Claude-3.7-Sonnet, and o1/o3-mini are the most powerful LLMs, where DeepSeek-R1 can outperform Claude-3.7-Sonnet and o1/o3-mini in most NP-complete problems considered, and iii) the numbers of tokens, aha moments in the advanced LLMs, e.g., Claude-3.7-Sonnet and DeepSeek-R1, are observed first to increase and then decrease when the problem instances become more and more difficult. Through continuously scaling analysis, NPPC can provide critical insights into LLMs' reasoning capabilities, exposing fundamental limitations and suggesting future directions for further improvements.

URL: https://openreview.net/forum?id=Xb6d5lGLb2

---

Title: Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization

Abstract: As the general capabilities of artificial intelligence (AI) agents continue to evolve, their ability to learn to master multiple complex tasks through experience remains a key challenge. Current LLM agents, particularly those based on proprietary language models, typically rely on prompts to incorporate knowledge about the target tasks. This approach does not allow the agent to *internalize* this information and instead relies on ever-expanding prompts to sustain its functionality in diverse scenarios. This resembles a system of notes used by a person affected by anterograde amnesia, the inability to form new memories. In this paper, we propose a novel method to train AI agents to incorporate knowledge and skills for multiple tasks without the need for either cumbersome note systems or prior high-quality demonstration data. Our approach employs an iterative process where the agent collects new experiences, receives corrective feedback from humans in the form of hints, and integrates this feedback into its weights via a context distillation training procedure. We demonstrate the efficacy of our approach by implementing it in a Llama-3-based agent that, after only a few rounds of feedback, outperforms advanced models GPT-4o and DeepSeek-V3 in tasksets requiring correct sequencing of information retrieval, tool use, and question answering.

URL: https://openreview.net/forum?id=AtCURrC3XA

---

Reply all

Reply to author

Forward

0 new messages