🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications |
Published at 2025-09-23 |
|
#ML
|
The study examines testing practices in open-source AI agent frameworks and applications, identifying ten testing patterns. It reveals that traditional testing methods are widely used to manage uncertainty in AI agents, while novel methods are rarely applied. The study highlights a need for improved support for novel testing methods, adoption of prompt regression testing, and further research into barriers to adoption to build more robust AI agents.... |
Read More |
|
|
|
![]() |
Infusing Theory of Mind into Socially Intelligent LLM Agents |
Published at 2025-09-26 |
|
#ML
|
This study shows that incorporating Theory of Mind, which is the ability to understand others' mental states, into LLM-based social agents like chatbots can improve their dialogue skills and goal achievement. The researchers developed a ToM-focused dialogue agent called ToMA, which outperforms other baselines in social intelligence tests by exhibiting strategic, goal-oriented behaviors and maintaining better relationships.... |
Read More |
|
|
|
|
![]() |
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights |
Published at 2025-09-26 |
|
#ML
|
This research presents a new method called SINQ to improve the accuracy of low-precision language models by addressing the issue of precision loss in outlier parameters. The method introduces an additional scale factor and a fast algorithm to normalize variances, resulting in significantly better performance on various models compared to traditional quantization techniques.... |
Read More |
|
|
|
![]() |
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned |
Published at 2025-09-27 |
|
#ML
|
This study explores ways to improve Vision-Language Process Reward Models (VL-PRMs) for multimodal reasoning by introducing a hybrid data synthesis framework, perception-focused supervision, and evaluating test-time scaling strategies. Experiments on five benchmarks reveal that smaller VL-PRMs can be as effective as larger ones, perception-level supervision significantly improves performance, and VL-PRMs can uncover hidden reasoning abilities in VLMs.... |
Read More |
|
|
|
|
![]() |
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models |
Published at 2025-09-29 |
|
#ML
|
The authors propose a method to enhance image generation in diffusion models by aligning pretrained visual encoders as tokenizers, which captures both high-level semantics and perceptual details through a three-stage alignment strategy. This approach accelerates model convergence, achieving state-of-the-art results on ImageNet and LAION datasets, providing a simple and scalable solution for continuous tokenizer design.... |
Read More |
|
|
|
![]() |
Boolean Satisfiability via Imitation Learning |
Published at 2025-09-29 |
|
#ML
|
The authors present ImitSAT, a new approach for the Boolean satisfiability problem that learns from expert decisions to improve the performance of CDCL solvers. By using dense decision-level supervision, ImitSAT reduces propagation counts and runtime, outperforming other learned methods in extensive experiments.... |
Read More |
|
|
|
|
![]() |
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search |
Published at 2025-09-29 |
|
#ML
|
The DeepSearch framework combats the limitations of current reinforcement learning with verifiable rewards (RLVR) by integrating Monte Carlo Tree Search into the training process, leading to more efficient exploration and better performance on mathematical reasoning benchmarks.... |
Read More |
|
|
|
![]() |
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution |
Published at 2025-09-29 |
|
#ML
|
The authors present a new framework called Flash-Searcher that improves the efficiency of language models in complex reasoning tasks by using directed acyclic graphs to execute tasks in parallel, which leads to faster execution and better performance compared to existing frameworks.... |
Read More |
|
|
|
|
![]() |
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures |
Published at 2025-09-29 |
|
#ML
|
The research presents a new method called Hyperdimensional Probe that improves the understanding of Large Language Models' internal representations by combining symbolic representations and neural probing. This technique effectively extracts meaningful concepts from various LLMs, embedding sizes, and input domains, and helps identify LLM failures, enhancing the interpretability of these models.... |
Read More |
|
|
|
![]() |
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources |
Published at 2025-09-29 |
|
#ML
|
MixtureVitae is a new, open pretraining dataset that offers strong model performance while minimizing legal risks. It uses a mix of public-domain, permissively licensed, and low-risk texts, along with instruction, reasoning, and synthetic data. Experiments show that models trained on MixtureVitae perform well on various benchmarks, especially in math, code, and QA tasks, making it a practical and legally safe choice for training capable language models.... |
Read More |
|
|
|
|
![]() |
PIPer: On-Device Environment Setup via Online Reinforcement Learning |
Published at 2025-09-29 |
|
#ML
|
This research presents a new method for automatically setting up software environments on personal devices using advanced machine learning techniques. By training a model to create Bash scripts and adapting it for environment setup, the method enables a model that runs on regular hardware to perform as well as much larger models, making it easier for developers and researchers to configure software environments without manual effort.... |
Read More |
|
|
|
![]() |
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs |
Published at 2025-09-30 |
|
#ML
|
The study presents BatonVoice, a new framework that enhances speech synthesis by utilizing the linguistic intelligence of Large Language Models (LLMs). It separates instruction understanding from speech generation, where an LLM creates a textual plan of vocal features, which a separate TTS model then uses to generate speech. The approach improves controllable and emotional speech synthesis, outperforming existing models and enabling zero-shot cross-lingual generalization.... |
Read More |
|
|
|
|
![]() |
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses |
Published at 2025-09-30 |
|
#ML
|
The authors present BiasFreeBench, a comprehensive benchmark for evaluating bias mitigation techniques in large language models. It compares eight popular methods across two scenarios and introduces a new metric, Bias-Free Score, to measure fairness, safety, and anti-stereotypical responses in a unified format.... |
Read More |
|
|
|
![]() |
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration |
Published at 2025-09-30 |
|
#ML
|
The paper presents BindWeave, a new framework that improves subject-consistent video generation by using a pretrained multimodal large language model to interpret complex prompts and ground entities. This approach results in more coherent and detailed videos compared to existing models, as demonstrated by experiments on the OpenS2V benchmark.... |
Read More |
|
|
|
|
![]() |
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing |
Published at 2025-09-30 |
|
#ML
|
The authors present a new reward model, EditReward, for instruction-guided image editing, which outperforms existing models in alignment with human preferences. EditReward is trained on a large-scale human preference dataset and can be used to scale up high-quality training data for image editing, as well as for advanced applications like reinforcement learning-based post-training.... |
Read More |
|
|
|
![]() |
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation |
Published at 2025-09-30 |
|
#ML
|
The study proposes a method to optimize the allocation of exploration budgets in large language models during reinforcement learning, addressing the issue of uniform allocation that leads to inefficient use of resources. By formulating the problem as a knapsack problem, the method adaptively distributes resources based on the model's learning status, resulting in improved performance on challenging tasks and significant gains on mathematical reasoning benchmarks.... |
Read More |
|
|
|
|
![]() |
TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks |
Published at 2025-09-30 |
|
#ML
|
The authors present a new method called TGPO to solve complex, long-term tasks for robots using a language called Signal Temporal Logic. TGPO breaks down tasks into smaller, manageable goals and uses a two-level system to plan and execute them efficiently, outperforming existing methods in various tests.... |
Read More |
|
|
|
![]() |
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators |
Published at 2025-09-30 |
|
#ML
|
The authors present a framework that uses a simulator trained on real interaction data to improve Vision-Language-Action models. This method reduces the need for samples, outperforms supervised learning, and is more robust to changes in conditions compared to traditional reinforcement learning methods.... |
Read More |
|
|
|
|
![]() |
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs |
Published at 2025-09-30 |
|
#ML
|
This study presents a new method called VLM-FO1 that improves the ability of Vision-Language Models to accurately locate and understand specific visual details, which they previously struggled with. The method works by converting the challenging task of generating precise coordinates into a more manageable one and can be easily added to existing models, leading to better performance in various perception tasks without affecting their overall visual understanding.... |
Read More |
|
|
|
![]() |
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls |
Published at 2025-09-30 |
|
#ML
|
This study investigates why transformer models struggle with multi-digit multiplication and discovers that they lack long-range dependencies. The researchers reverse-engineer a model that can perform multiplication and find that it uses attention to create a graph for storing and retrieving partial products, as well as efficient representations like Minkowski sums and Fourier basis. They then introduce an auxiliary loss to help standard models learn these dependencies, improving their ability to... |
Read More |
|
|
|
|
![]() |
ACON: Optimizing Context Compression for Long-horizon LLM Agents |
Published at 2025-10-01 |
|
#ML
|
The study presents a framework called ACON that efficiently compresses large amounts of data into concise information for long-term tasks in AI agents, improving memory usage and performance without sacrificing accuracy.... |
Read More |
|
|
|
![]() |
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum |
Published at 2025-10-01 |
|
#ML
|
The study explores alternative probability-based objectives to the standard negative log likelihood (NLL) used in supervised fine-tuning of large language models. Through extensive experiments, they identify a 'model-capability continuum' that influences which objective performs best, offering a new way to tailor objectives to a model's strengths and weaknesses.... |
Read More |
|
|
|
|
![]() |
BroRL: Scaling Reinforcement Learning via Broadened Exploration |
Published at 2025-10-01 |
|
#ML
|
This study proposes a new method called BroRL that improves reinforcement learning by increasing the number of simulations per example, which leads to better performance without the diminishing returns seen in previous methods. The method is based on a mathematical analysis that ensures correct answers are more likely as the number of simulations increases, and experimental results show that it outperforms existing methods on various benchmarks.... |
Read More |
|
|
|
![]() |
Code2Video: A Code-centric Paradigm for Educational Video Generation |
Published at 2025-10-01 |
|
#ML
|
The researchers present a new method called Code2Video that uses Python code to create professional educational videos. This approach involves three agents: Planner, Coder, and Critic, which work together to structure content, generate code, and refine visuals, respectively. The resulting videos are of high quality, with a 40% improvement over direct code generation, and the code and datasets are available for public use.... |
Read More |
|
|
|
|
![]() |
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs |
Published at 2025-10-01 |
|
#ML
|
This study improves the training efficiency of large language models (LLMs) on reasoning tasks by focusing on prompt selection and rollout quantity allocation, resulting in a new method called CurES that outperforms existing methods and achieves faster convergence.... |
Read More |
|
|
|
![]() |
Eliciting Secret Knowledge from Language Models |
Published at 2025-10-01 |
|
#ML
|
The study focuses on secret elicitation from AI, specifically large language models (LLMs), where the models are trained to have specific knowledge they do not directly reveal. The researchers develop and test various techniques, both black-box and white-box, to uncover this hidden knowledge, with prefill attacks and logit lens/sparse autoencoders being the most effective in different settings. They make their models and code publicly available for further research.... |
Read More |
|
|
|
|
![]() |
GEM: A Gym for Agentic LLMs |
Published at 2025-10-01 |
|
#ML
|
The authors present GEM, a new open-source environment simulator for language learning models that uses experience-based learning, similar to OpenAI-Gym for traditional reinforcement learning. GEM offers a standardized interface, high-throughput execution, and various tools and examples, and it also serves as a benchmarking toolkit for evaluating different algorithms in both single- and multi-turn settings.... |
Read More |
|
|
|
![]() |
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness |
Published at 2025-10-01 |
|
#ML
|
This study presents GUI-KV, a method to improve the efficiency of graphical user interface agents by reducing the memory usage of key-value caching. It introduces new techniques to better preserve important visual information and eliminate redundant data, resulting in faster decoding and higher accuracy for these agents.... |
Read More |
|
|
|
|
![]() |
In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning |
Published at 2025-10-01 |
|
#ML
|
The study presents a new method called 'in-place feedback' where users edit the language model's previous response directly, which leads to more accurate and efficient improvements in multi-turn reasoning tasks compared to traditional feedback methods.... |
Read More |
|
|
|
![]() |
It Takes Two: Your GRPO Is Secretly DPO |
Published at 2025-10-01 |
|
#ML
|
This research explores a new approach to Group Relative Policy Optimization (GRPO), a method used to train large language models, by connecting it to Direct Preference Optimization (DPO) and investigating a smaller group size. The findings show that this new method, 2-GRPO, performs just as well as the traditional GRPO while using less computational resources and cutting training time significantly.... |
Read More |
|
|
|
|
![]() |
JoyAgent-JDGenie: Technical Report on the GAIA |
Published at 2025-10-01 |
|
#ML
|
The authors present a new AI agent architecture that combines planning, memory, and tools for complex tasks. This architecture, tested on various tasks, outperforms open-source systems and performs well against proprietary ones, showing the benefits of integrating different AI components for robust and adaptable AI assistants.... |
Read More |
|
|
|
![]() |
Making, not Taking, the Best of N |
Published at 2025-10-01 |
|
#ML
|
This study introduces a new method called Fusion-of-N (FusioN) that combines the best elements from multiple language model generations, outperforming the traditional Best-of-N approach. The researchers demonstrate FusioN's effectiveness in two settings: test-time scaling and synthetic data generation, across various languages and tasks, suggesting a shift from evaluating language models based on a single generation to embracing their diverse strengths.... |
Read More |
|
|
|
|
![]() |
On Predictability of Reinforcement Learning Dynamics for Large Language Models |
Published at 2025-10-01 |
|
#ML
|
This research discovers that the improvements in large language models during reinforcement learning training mainly come from a single direction of parameter updates, which changes in a predictable way. Using these insights, they developed a method to speed up training by up to 2.5 times without sacrificing performance, making it a useful tool for training large models efficiently.... |
Read More |
|
|
|
![]() |
Pay-Per-Search Models are Abstention Models |
Published at 2025-10-01 |
|
#ML
|
The authors present MASH, a training framework that uses reinforcement learning to improve large language models' ability to recognize when they need external help, simulating human-like abstention behavior. Experiments show that MASH significantly enhances answer accuracy and demonstrates strong off-the-shelf abstention capabilities without requiring pre-determined knowledge boundaries.... |
Read More |
|
|
|
|
![]() |
QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL |
Published at 2025-10-01 |
|
#ML
|
The QUASAR framework uses a reinforcement learning approach to improve the generation and optimization of quantum circuits by LLMs, addressing challenges like precise parameter values and lack of quantum domain-specific knowledge, resulting in high validity and outperforming other models.... |
Read More |
|
|
|
![]() |
ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction |
Published at 2025-10-01 |
|
#ML
|
The researchers present a new method called Reservoir SWD (ReSWD) that improves the Sliced Wasserstein Distance (SWD) by reducing its high variance, which leads to more stable gradients and faster convergence. They achieve this by incorporating Weighted Reservoir Sampling, and their method is shown to outperform standard SWD and other variance reduction techniques in various tasks, from synthetic benchmarks to real-world applications like color correction and diffusion guidance.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|