🤗 Daily Paper(2025-09-30)

3 views

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 30, 2025, 4:07:50 PMSep 30

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning

Published at 2025-09-26

#ML

The researchers developed Critique-Coder, a model that improves upon traditional reasoning models by incorporating critique reinforcement learning. Critique-Coder outperforms other models in code generation and general reasoning tasks, demonstrating the benefits of critique-based learning....

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time

Published at 2025-09-26

#ML

The study presents a new strategy called Dynamic Experts Search that improves reasoning in large language models by controlling the number of activated experts during inference, leading to better accuracy and stability without extra cost....

Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification

Published at 2025-09-26

#ML

The study presents DafnyCOMP, a new benchmark for testing large language models on creating specifications for complex programs made up of multiple interacting functions. The results show that while these models perform well on simpler tasks, they struggle with the more difficult compositional tasks, highlighting areas for improvement in cross-functional reasoning....

MMPB: It's Time for Multi-Modal Personalization

Published at 2025-09-26

#ML

The study presents MMPB, the first extensive benchmark for evaluating Vision-Language Models (VLMs) in terms of personalization for user-facing AI systems. By using a three-stage protocol to assess 23 VLMs, the research reveals that most models struggle with maintaining consistency, handling preferences, and adapting to visual cues, indicating significant opportunities for improvement in personalized multi-modal AI....

MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning

Published at 2025-09-26

#ML

The authors present MultiCrafter, a framework that improves multi-subject image generation by addressing attribute leakage and aligning with human preferences. They achieve this through explicit positional supervision to separate attention regions for each subject, a Mixture-of-Experts architecture to handle diverse scenarios, and a novel online reinforcement learning framework to match human preferences....

Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective

Published at 2025-09-26

#ML

The authors propose a new method for large language model distillation by treating it as a constrained reinforcement learning problem, which optimizes task-specific rewards while limiting changes from the teacher model. Through experiments, they show that their approach achieves better constraint satisfaction and reasoning skills without sacrificing task performance, making it a practical and efficient solution for reward-aware distillation in resource-limited environments....

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Published at 2025-09-26

#ML

The study presents a new speech tokenizer called StableToken that is more stable and less sensitive to noise compared to existing tokenizers. It achieves this stability through a multi-branch architecture and a bit-wise voting mechanism, which significantly reduces errors in token sequences under various noise conditions, ultimately improving the robustness of speech-based language models....

Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding

Published at 2025-09-26

#ML

This study analyzes how Large Vision-Language Models (LVLMs) use visual information compared to memorized text patterns, identifying a 'Visual Integration Point' where visual data significantly impacts model behavior. They introduce a tool, Total Visual Integration, to measure the strength of visual influence, finding consistent results across various models and datasets....

VideoScore2: Think before You Score in Generative Video Evaluation

Published at 2025-09-26

#ML

The authors present VideoScore2, a new framework for evaluating generated videos that considers multiple factors like visual quality, alignment with text, and physical consistency. This framework provides detailed explanations for its evaluations, which helps in understanding and improving the generation process, and has been shown to outperform existing methods in both in-domain and out-of-domain benchmarks....

When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Published at 2025-09-26

#ML

This study compares the performance of reasoning models versus instruction fine-tuning models on various tasks, finding that reasoning models often outperform larger non-reasoning models, especially for complex tasks, despite higher inference costs....

Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

Published at 2025-09-27

#ML

The authors present a new method called MetaAPO that improves the alignment of large language models with human values by dynamically coupling data generation with model training, which outperforms existing methods and reduces online annotation costs by 42%....

Democratizing AI scientists using ToolUniverse

Published at 2025-09-27

#ML

ToolUniverse is a new platform that makes it easier for AI systems to access and use various tools and resources for research in areas like data analysis, knowledge retrieval, and experimental design. This platform was used to create an AI system that helped discover a new drug for hypercholesterolemia, and it is now available for others to use and build upon....

From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs

Published at 2025-09-27

#ML

The study investigates why reasoning language models sometimes perform worse with few-shot examples than with direct answers and finds that this is due to semantic misguidance and strategy transfer failure. The researchers then introduce Insight-to-Solve (I2S), a method that turns demonstrations into useful insights and improves performance on various benchmarks for both open- and closed-source models....

Multiplayer Nash Preference Optimization

Published at 2025-09-27

#ML

This research presents a new method called Multiplayer Nash Preference Optimization (MNPO) that improves upon existing techniques for aligning large language models with human preferences. MNPO extends the concept of aligning models with human feedback to a multiplayer setting, allowing for more complex and diverse preference structures, and outperforms previous methods in various evaluations....

Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning

Published at 2025-09-27

#ML

This study investigates how using external tools affects large language models' reasoning and proposes a new framework called Tool-Light. Tool-Light helps models use tools more efficiently and accurately by carefully constructing training datasets and fine-tuning the models in two stages, leading to improved performance in tool-integrated reasoning tasks....

WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning

Published at 2025-09-27

#ML

The researchers developed WirelessMathLM, a compact language model specifically trained for mathematical reasoning in wireless communications using reinforcement learning. This model achieves high accuracy on specialized mathematics problems in wireless communications and even improves performance on general mathematics benchmarks without training on them....

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Published at 2025-09-28

#ML

The authors present AceSearcher, a framework that enhances large language models for complex reasoning tasks by training them to alternate between breaking down queries and integrating retrieved information. AceSearcher outperforms other models, including one with 9 times more parameters, on reasoning tasks while using fewer resources....

Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

Published at 2025-09-28

#ML

The study challenges the conventional view of a trade-off between exploration and exploitation in reinforcement learning, suggesting that these processes can be separated and improved together. The researchers introduce a new method called VERL, which enhances both exploration and exploitation simultaneously by using a stable predictive controller, leading to significant accuracy improvements in various language models and reasoning tasks....

EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

Published at 2025-09-28

#ML

The authors propose EditScore, a series of reward models for improving instruction-guided image editing, addressing the challenge of creating a high-fidelity, efficient reward signal in Reinforcement Learning (RL). EditScore surpasses large VLMs like GPT-5 in benchmarks and enables efficient RL training, resulting in a significant performance improvement in image editing models....

Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

Published at 2025-09-28

#ML

The authors present a new framework called DART that improves the efficiency of training vision-language model based GUI agents for complex desktop and mobile tasks by decoupling the training system into four asynchronous modules, resulting in higher GPU and environment utilization. Additionally, they introduce an adaptive data curation scheme that enhances learning from abundant samples, leading to a significant improvement in task success rate on the OSWorld benchmark compared to the base mode...

HunyuanImage 3.0 Technical Report

Published at 2025-09-28

#ML

HunyuanImage 3.0 is a powerful, open-source image generative model that unifies multimodal understanding and generation. It's built with a native Chain-of-Thoughts schema, advanced architecture design, and an efficient infrastructure, enabling it to rival previous state-of-the-art models in text-image alignment and visual quality....

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Published at 2025-09-28

#ML

Researchers developed SLA, a new attention method for diffusion transformers, which significantly speeds up video generation by reducing attention computation by 95% without sacrificing quality. They achieved this by categorizing attention weights and applying different attention mechanisms to each category, resulting in a 13.7x speedup in attention computation and a 2.2x overall speedup in video generation....

Sequential Diffusion Language Models

Published at 2025-09-28

#ML

The proposed Sequential Diffusion Language Model (SDLM) overcomes the limitations of fixed-length decoding and incompatibility with key-value caches in diffusion language models by enabling adaptive generation length at each step and preserving cache compatibility, resulting in improved efficiency and performance....

SparseD: Sparse Attention for Diffusion Language Models

Published at 2025-09-28

#ML

The authors present SparseD, a new method for reducing inference latency in diffusion language models (DLMs). SparseD addresses the unique sparsity behaviors in DLMs by reusing head-specific sparse patterns and using full attention in early steps, resulting in efficient and practical DLM deployment for long-context applications....

Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step

Published at 2025-09-28

#ML

This study explores and improves decoding strategies and reinforcement learning algorithms for masked diffusion language models (MDLMs), which offer benefits like parallel decoding and fewer inference steps compared to autoregressive language models. The research introduces EOS Early Rejection, Ascending Step-Size decoding scheduler, and Consistency Trajectory Group Relative Policy Optimization to optimize MDLMs' performance, reduce optimization errors, and achieve competitive results with fewer...

UniVid: The Open-Source Unified Video Model

Published at 2025-09-28

#ML

The study presents UniVid, a novel model that effectively combines MLLMs and diffusion decoders for both understanding and generating videos. UniVid addresses common challenges in unified video modeling through innovative techniques like Temperature Modality Alignment and Pyramid Reflection, resulting in improved performance on various benchmarks compared to existing models....

Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models

Published at 2025-09-29

#ML

This study analyzes a new approach called Advantage Weighted Matching (AWM) for aligning reinforcement learning with pretraining in diffusion models. AWM reduces variance and speeds up convergence by giving more weight to high-reward samples and less to low-reward ones, using the same score/flow-matching loss as pretraining, resulting in a significant improvement over existing methods....

BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation

Published at 2025-09-29

#ML

Researchers have developed a new system called BRIDGE that generates over 20 million realistic images with accurate depth information using reinforcement learning. This system helps train a depth estimation model more effectively, leading to improved performance in various scenes compared to existing methods....

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

Published at 2025-09-29

#ML

The authors present EasySteer, a new framework for efficiently controlling large language models during use. It's designed to be flexible, fast, and easy to use, outperforming existing systems by 5.5 to 11.4 times and demonstrating effectiveness in various real-world applications....

Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks

Published at 2025-09-29

#ML

The authors present a new method to improve spatial understanding in vision-language models by training them on a dataset of geometric problems. Their approach, which involves fine-tuning models on 30,000 geometry problems, leads to significant improvements in spatial reasoning benchmarks, with the best model outperforming the previous state-of-the-art model....

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

Published at 2025-09-29

#ML

This study demonstrates that Evolution Strategies can efficiently fine-tune large language models, outperforming traditional Reinforcement Learning methods in various aspects, such as sample efficiency and stability....

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Published at 2025-09-29

#ML

The study presents evidence that large language models can develop new skills through reinforcement learning by combining existing ones, similar to human cognitive skill acquisition. The research introduces a new framework to investigate this, demonstrating that RL enables models to learn unseen compositions of functions and even transfer this ability to different tasks, whereas next-token training does not produce these results....

GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts

Published at 2025-09-29

#ML

The study presents GSM8K-V, a new benchmark that tests the visual mathematical reasoning abilities of vision language models. It shows that while these models perform well on text-based math problems, they struggle with visual ones, highlighting the need for improvement in this area....

Hyperspherical Latents Improve Continuous-Token Autoregressive Generation

Published at 2025-09-29

#ML

The study presents SphereAR, a new method for improving continuous-token autoregressive image generation by addressing the issue of heterogeneous variance in VAE latents. By constraining all AR inputs and outputs to lie on a fixed-radius hypersphere, SphereAR stabilizes AR decoding and sets new state-of-the-art performance on ImageNet generation, outperforming larger baselines....

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

Published at 2025-09-29

#ML

The study presents InfLLM-V2, a new attention mechanism that efficiently handles long sequences without the drawbacks of previous methods. InfLLM-V2 seamlessly adapts models for short and long sequences, maintains performance, and is 4 times faster than dense attention, making it a practical solution for long-context understanding and reasoning tasks....

LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning

Published at 2025-09-29

#ML

The authors present LOVE-R1, a model that improves long video understanding by adaptively zooming in on video clips. This model uses a multi-step reasoning process to first view densely sampled frames at a small resolution and then zoom in on frames of interest at a larger resolution, allowing it to better balance temporal understanding and spatial details. Experiments show that LOVE-R1 outperforms a baseline model by an average of 3.1% points across four long video understanding benchmarks....

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Published at 2025-09-29

#ML

MGM-Omni is a new model that combines understanding and speech generation into one system, allowing for efficient and personalized long-term speech. It can handle different types of audio and produce natural, context-aware speech while maintaining a stable voice, and it requires less data to train compared to other models....

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Published at 2025-09-29

#ML

The researchers created a large-scale dataset called OpenGPT-4o-Image for improving image generation and editing by unified multimodal models. This dataset, constructed using a novel methodology, covers 11 major domains and 51 subtasks with 80k high-quality instruction-image pairs, resulting in significant performance improvements for leading models in various benchmarks....

PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images

Published at 2025-09-29

#ML

The authors present a new system called PixelCraft that improves visual reasoning on complex images like charts and diagrams. This system uses a team of agents to process images in high detail, discuss and revise their reasoning, and adapt their approach based on previous steps, leading to better performance on challenging image tasks....

Pretraining Large Language Models with NVFP4

Published at 2025-09-29

#ML

The study presents a new method for efficiently training large language models using NVFP4 format, which improves computational speed and resource utilization. The approach involves techniques like Random Hadamard transforms, two-dimensional quantization, and selective high-precision layers, resulting in a model with comparable performance to an 8-bit floating point baseline, while using significantly less compute and energy....

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

Published at 2025-09-29

#ML

The study presents a new method called ROVER, which simplifies the training of large language models for math reasoning by eliminating the need for complex policy optimization techniques. ROVER outperforms existing methods in both accuracy and diversity, while being more efficient and easier to train....

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Published at 2025-09-29

#ML

The study presents RealUnify, a benchmark to evaluate the synergy between understanding and generation in unified multimodal models, and finds that current models struggle to achieve effective synergy, suggesting the need for new training strategies and inductive biases....

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

Published at 2025-09-29

#ML

The authors present a new method called Rolling Forcing to generate long videos in real-time with minimal errors. This technique uses a joint denoising scheme, attention sink mechanism, and an efficient training algorithm to reduce error accumulation, allowing for high-quality, low-latency video streaming....

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Published at 2025-09-29

#ML

SANA-Video is a small, efficient diffusion model for generating high-resolution, high-quality videos up to 720x1280 resolution and minute-length duration. Its key features include a linear attention mechanism and a constant-memory KV cache, which allow for fast, low-cost video generation on RTX 5090 GPUs....

SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression

Published at 2025-09-29

#ML

The paper presents a new method called SIRI that improves the efficiency and accuracy of large reasoning models by alternating between compressing and expanding the reasoning budget during training. This approach reduces redundant thinking patterns and increases reasoning density, resulting in improved performance and reduced token usage on a benchmark task....

Scaling Generalist Data-Analytic Agents

Published at 2025-09-29

#ML

The authors present DataMind, a method for creating generalist data-analytic agents that can handle diverse formats and long-term reasoning. DataMind improves upon existing approaches by addressing challenges such as insufficient data, improper training, and code-based multi-turn rollout instability, resulting in state-of-the-art performance on data analysis benchmarks....

The Era of Real-World Human Interaction: RL from User Conversations

Published at 2025-09-29

#ML

The study presents a new approach called RLHI that learns from real user conversations to improve and align models better. It introduces two methods: RLHI with User-Guided Rewrites and RLHI with User-Based Rewards, which use user feedback and long-term interaction history to enhance model performance in personalization, instruction-following, and reasoning....

Towards Personalized Deep Research: Benchmarks and Evaluations

Published at 2025-09-29

#ML

The authors present the first benchmark, Personalized Deep Research Bench, for evaluating personalization in AI research assistants, which includes 250 user-task queries across 10 domains and 25 user profiles. They also propose a new evaluation framework, PQR, to measure personalization alignment, content quality, and factual reliability, and use it to assess various systems' capabilities and limitations in handling personalized deep research....

VGGT-X: When VGGT Meets Dense Novel View Synthesis

Published at 2025-09-29

#ML

This study explores using 3D Foundation Models for dense Novel View Synthesis, addressing challenges like high VRAM usage and imperfect outputs. The proposed VGGT-X solution improves efficiency, output quality, and training robustness, achieving state-of-the-art results in dense NVS and pose estimation, while also offering insights for future advancements in the field....

Visual Jigsaw Post-Training Improves MLLMs

Published at 2025-09-29

#ML

The study presents Visual Jigsaw, a new self-supervised framework to improve visual understanding in multimodal language models. This framework, which works by having the model reconstruct visual information from shuffled inputs, demonstrates significant enhancements in perception, reasoning, and spatial understanding across different visual modalities....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages