🤗 Daily Paper(2025-10-02)

2 views
Skip to first unread message

deep.di...@gmail.com

unread,
Oct 2, 2025, 4:08:06 PMOct 2
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications

Published at 2025-09-23

#ML

The study examines testing practices in open-source AI agent frameworks and applications, identifying ten testing patterns. It reveals that traditional testing methods are widely used to manage uncertainty in AI agents, while novel methods are rarely applied. The study highlights a need for improved support for novel testing methods, adoption of prompt regression testing, and further research into barriers to adoption to build more robust AI agents....

Read Moreicon

Infusing Theory of Mind into Socially Intelligent LLM Agents

Published at 2025-09-26

#ML

This study shows that incorporating Theory of Mind, which is the ability to understand others' mental states, into LLM-based social agents like chatbots can improve their dialogue skills and goal achievement. The researchers developed a ToM-focused dialogue agent called ToMA, which outperforms other baselines in social intelligence tests by exhibiting strategic, goal-oriented behaviors and maintaining better relationships....

Read Moreicon

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Published at 2025-09-26

#ML

This research presents a new method called SINQ to improve the accuracy of low-precision language models by addressing the issue of precision loss in outlier parameters. The method introduces an additional scale factor and a fast algorithm to normalize variances, resulting in significantly better performance on various models compared to traditional quantization techniques....

Read Moreicon

Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned

Published at 2025-09-27

#ML

This study explores ways to improve Vision-Language Process Reward Models (VL-PRMs) for multimodal reasoning by introducing a hybrid data synthesis framework, perception-focused supervision, and evaluating test-time scaling strategies. Experiments on five benchmarks reveal that smaller VL-PRMs can be as effective as larger ones, perception-level supervision significantly improves performance, and VL-PRMs can uncover hidden reasoning abilities in VLMs....

Read Moreicon

Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models

Published at 2025-09-29

#ML

The authors propose a method to enhance image generation in diffusion models by aligning pretrained visual encoders as tokenizers, which captures both high-level semantics and perceptual details through a three-stage alignment strategy. This approach accelerates model convergence, achieving state-of-the-art results on ImageNet and LAION datasets, providing a simple and scalable solution for continuous tokenizer design....

Read Moreicon

Boolean Satisfiability via Imitation Learning

Published at 2025-09-29

#ML

The authors present ImitSAT, a new approach for the Boolean satisfiability problem that learns from expert decisions to improve the performance of CDCL solvers. By using dense decision-level supervision, ImitSAT reduces propagation counts and runtime, outperforming other learned methods in extensive experiments....

Read Moreicon

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Published at 2025-09-29

#ML

The DeepSearch framework combats the limitations of current reinforcement learning with verifiable rewards (RLVR) by integrating Monte Carlo Tree Search into the training process, leading to more efficient exploration and better performance on mathematical reasoning benchmarks....

Read Moreicon

Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution

Published at 2025-09-29

#ML

The authors present a new framework called Flash-Searcher that improves the efficiency of language models in complex reasoning tasks by using directed acyclic graphs to execute tasks in parallel, which leads to faster execution and better performance compared to existing frameworks....

Read Moreicon

Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

Published at 2025-09-29

#ML

The research presents a new method called Hyperdimensional Probe that improves the understanding of Large Language Models' internal representations by combining symbolic representations and neural probing. This technique effectively extracts meaningful concepts from various LLMs, embedding sizes, and input domains, and helps identify LLM failures, enhancing the interpretability of these models....

Read Moreicon

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Published at 2025-09-29

#ML

MixtureVitae is a new, open pretraining dataset that offers strong model performance while minimizing legal risks. It uses a mix of public-domain, permissively licensed, and low-risk texts, along with instruction, reasoning, and synthetic data. Experiments show that models trained on MixtureVitae perform well on various benchmarks, especially in math, code, and QA tasks, making it a practical and legally safe choice for training capable language models....

Read Moreicon

PIPer: On-Device Environment Setup via Online Reinforcement Learning

Published at 2025-09-29

#ML

This research presents a new method for automatically setting up software environments on personal devices using advanced machine learning techniques. By training a model to create Bash scripts and adapting it for environment setup, the method enables a model that runs on regular hardware to perform as well as much larger models, making it easier for developers and researchers to configure software environments without manual effort....

Read Moreicon

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

Published at 2025-09-30

#ML

The study presents BatonVoice, a new framework that enhances speech synthesis by utilizing the linguistic intelligence of Large Language Models (LLMs). It separates instruction understanding from speech generation, where an LLM creates a textual plan of vocal features, which a separate TTS model then uses to generate speech. The approach improves controllable and emotional speech synthesis, outperforming existing models and enabling zero-shot cross-lingual generalization....

Read Moreicon

BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

Published at 2025-09-30

#ML

The authors present BiasFreeBench, a comprehensive benchmark for evaluating bias mitigation techniques in large language models. It compares eight popular methods across two scenarios and introduces a new metric, Bias-Free Score, to measure fairness, safety, and anti-stereotypical responses in a unified format....

Read Moreicon

BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration

Published at 2025-09-30

#ML

The paper presents BindWeave, a new framework that improves subject-consistent video generation by using a pretrained multimodal large language model to interpret complex prompts and ground entities. This approach results in more coherent and detailed videos compared to existing models, as demonstrated by experiments on the OpenS2V benchmark....

Read Moreicon

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

Published at 2025-09-30

#ML

The authors present a new reward model, EditReward, for instruction-guided image editing, which outperforms existing models in alignment with human preferences. EditReward is trained on a large-scale human preference dataset and can be used to scale up high-quality training data for image editing, as well as for advanced applications like reinforcement learning-based post-training....

Read Moreicon

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Published at 2025-09-30

#ML

The study proposes a method to optimize the allocation of exploration budgets in large language models during reinforcement learning, addressing the issue of uniform allocation that leads to inefficient use of resources. By formulating the problem as a knapsack problem, the method adaptively distributes resources based on the model's learning status, resulting in improved performance on challenging tasks and significant gains on mathematical reasoning benchmarks....

Read Moreicon

TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks

Published at 2025-09-30

#ML

The authors present a new method called TGPO to solve complex, long-term tasks for robots using a language called Signal Temporal Logic. TGPO breaks down tasks into smaller, manageable goals and uses a two-level system to plan and execute them efficiently, outperforming existing methods in various tests....

Read Moreicon

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Published at 2025-09-30

#ML

The authors present a framework that uses a simulator trained on real interaction data to improve Vision-Language-Action models. This method reduces the need for samples, outperforms supervised learning, and is more robust to changes in conditions compared to traditional reinforcement learning methods....

Read Moreicon

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

Published at 2025-09-30

#ML

This study presents a new method called VLM-FO1 that improves the ability of Vision-Language Models to accurately locate and understand specific visual details, which they previously struggled with. The method works by converting the challenging task of generating precise coordinates into a more manageable one and can be easily added to existing models, leading to better performance in various perception tasks without affecting their overall visual understanding....

Read Moreicon

Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls

Published at 2025-09-30

#ML

This study investigates why transformer models struggle with multi-digit multiplication and discovers that they lack long-range dependencies. The researchers reverse-engineer a model that can perform multiplication and find that it uses attention to create a graph for storing and retrieving partial products, as well as efficient representations like Minkowski sums and Fourier basis. They then introduce an auxiliary loss to help standard models learn these dependencies, improving their ability to...

Read Moreicon

ACON: Optimizing Context Compression for Long-horizon LLM Agents

Published at 2025-10-01

#ML

The study presents a framework called ACON that efficiently compresses large amounts of data into concise information for long-term tasks in AI agents, improving memory usage and performance without sacrificing accuracy....

Read Moreicon

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Published at 2025-10-01

#ML

The study explores alternative probability-based objectives to the standard negative log likelihood (NLL) used in supervised fine-tuning of large language models. Through extensive experiments, they identify a 'model-capability continuum' that influences which objective performs best, offering a new way to tailor objectives to a model's strengths and weaknesses....

Read Moreicon

BroRL: Scaling Reinforcement Learning via Broadened Exploration

Published at 2025-10-01

#ML

This study proposes a new method called BroRL that improves reinforcement learning by increasing the number of simulations per example, which leads to better performance without the diminishing returns seen in previous methods. The method is based on a mathematical analysis that ensures correct answers are more likely as the number of simulations increases, and experimental results show that it outperforms existing methods on various benchmarks....

Read Moreicon

Code2Video: A Code-centric Paradigm for Educational Video Generation

Published at 2025-10-01

#ML

The researchers present a new method called Code2Video that uses Python code to create professional educational videos. This approach involves three agents: Planner, Coder, and Critic, which work together to structure content, generate code, and refine visuals, respectively. The resulting videos are of high quality, with a 40% improvement over direct code generation, and the code and datasets are available for public use....

Read Moreicon

CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

Published at 2025-10-01

#ML

This study improves the training efficiency of large language models (LLMs) on reasoning tasks by focusing on prompt selection and rollout quantity allocation, resulting in a new method called CurES that outperforms existing methods and achieves faster convergence....

Read Moreicon

Eliciting Secret Knowledge from Language Models

Published at 2025-10-01

#ML

The study focuses on secret elicitation from AI, specifically large language models (LLMs), where the models are trained to have specific knowledge they do not directly reveal. The researchers develop and test various techniques, both black-box and white-box, to uncover this hidden knowledge, with prefill attacks and logit lens/sparse autoencoders being the most effective in different settings. They make their models and code publicly available for further research....

Read Moreicon

GEM: A Gym for Agentic LLMs

Published at 2025-10-01

#ML

The authors present GEM, a new open-source environment simulator for language learning models that uses experience-based learning, similar to OpenAI-Gym for traditional reinforcement learning. GEM offers a standardized interface, high-throughput execution, and various tools and examples, and it also serves as a benchmarking toolkit for evaluating different algorithms in both single- and multi-turn settings....

Read Moreicon

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness

Published at 2025-10-01

#ML

This study presents GUI-KV, a method to improve the efficiency of graphical user interface agents by reducing the memory usage of key-value caching. It introduces new techniques to better preserve important visual information and eliminate redundant data, resulting in faster decoding and higher accuracy for these agents....

Read Moreicon

In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

Published at 2025-10-01

#ML

The study presents a new method called 'in-place feedback' where users edit the language model's previous response directly, which leads to more accurate and efficient improvements in multi-turn reasoning tasks compared to traditional feedback methods....

Read Moreicon

It Takes Two: Your GRPO Is Secretly DPO

Published at 2025-10-01

#ML

This research explores a new approach to Group Relative Policy Optimization (GRPO), a method used to train large language models, by connecting it to Direct Preference Optimization (DPO) and investigating a smaller group size. The findings show that this new method, 2-GRPO, performs just as well as the traditional GRPO while using less computational resources and cutting training time significantly....

Read Moreicon

JoyAgent-JDGenie: Technical Report on the GAIA

Published at 2025-10-01

#ML

The authors present a new AI agent architecture that combines planning, memory, and tools for complex tasks. This architecture, tested on various tasks, outperforms open-source systems and performs well against proprietary ones, showing the benefits of integrating different AI components for robust and adaptable AI assistants....

Read Moreicon

Making, not Taking, the Best of N

Published at 2025-10-01

#ML

This study introduces a new method called Fusion-of-N (FusioN) that combines the best elements from multiple language model generations, outperforming the traditional Best-of-N approach. The researchers demonstrate FusioN's effectiveness in two settings: test-time scaling and synthetic data generation, across various languages and tasks, suggesting a shift from evaluating language models based on a single generation to embracing their diverse strengths....

Read Moreicon

On Predictability of Reinforcement Learning Dynamics for Large Language Models

Published at 2025-10-01

#ML

This research discovers that the improvements in large language models during reinforcement learning training mainly come from a single direction of parameter updates, which changes in a predictable way. Using these insights, they developed a method to speed up training by up to 2.5 times without sacrificing performance, making it a useful tool for training large models efficiently....

Read Moreicon

Pay-Per-Search Models are Abstention Models

Published at 2025-10-01

#ML

The authors present MASH, a training framework that uses reinforcement learning to improve large language models' ability to recognize when they need external help, simulating human-like abstention behavior. Experiments show that MASH significantly enhances answer accuracy and demonstrates strong off-the-shelf abstention capabilities without requiring pre-determined knowledge boundaries....

Read Moreicon

QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL

Published at 2025-10-01

#ML

The QUASAR framework uses a reinforcement learning approach to improve the generation and optimization of quantum circuits by LLMs, addressing challenges like precise parameter values and lack of quantum domain-specific knowledge, resulting in high validity and outperforming other models....

Read Moreicon

ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction

Published at 2025-10-01

#ML

The researchers present a new method called Reservoir SWD (ReSWD) that improves the Sliced Wasserstein Distance (SWD) by reducing its high variance, which leads to more stable gradients and faster convergence. They achieve this by incorporating Weighted Reservoir Sampling, and their method is shown to outperform standard SWD and other variance reduction techniques in various tasks, from synthetic benchmarks to real-world applications like color correction and diffusion guidance....

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages