🤗 Daily Paper(2025-10-10)

3 views
Skip to first unread message

deep.di...@gmail.com

unread,
Oct 10, 2025, 4:07:37 PMOct 10
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Published at 2025-09-26

#ML

The study proposes a new training method called MASA that improves the 'meta-awareness' of reasoning models, enabling them to think better by aligning their predictions with actual outcomes. This method enhances accuracy and training efficiency, outperforming existing models on various tasks without requiring external training sources....

Read Moreicon

Beyond Outliers: A Study of Optimizers Under Quantization

Published at 2025-09-27

#ML

This study investigates how different optimizers affect model performance when quantized, focusing on both post-training and quantization-aware training methods. The results reveal that optimizers which perform well during original training may not be the best for quantization, with Shampoo showing the least accuracy loss and the highest parameter efficiency under quantization-aware training....

Read Moreicon

From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning

Published at 2025-09-28

#ML

The paper presents ChemMAS, a new multi-agent system that provides explanations for recommended chemical reaction conditions, which is important for scientific workflows. ChemMAS outperforms other methods in accuracy and offers understandable justifications based on chemical knowledge and past data, promoting transparency and trust in AI for scientific discovery....

Read Moreicon

MemMamba: Rethinking Memory Patterns in State Space Model

Published at 2025-09-28

#ML

The authors analyze the memory decay mechanism of Mamba, a state-space model, and propose a new framework called MemMamba. MemMamba improves long-range memory retention in Mamba by summarizing state information and using cross-layer and cross-token attention, resulting in better performance and efficiency on long-sequence tasks compared to existing methods....

Read Moreicon

Fidelity-Aware Data Composition for Robust Robot Generalization

Published at 2025-09-29

#ML

The research presents a method to improve the performance of robots trained on visually similar data by introducing a framework that optimizes the combination of real and synthetic data, ensuring the learning signal is not corrupted and resulting in a significant increase in success rates for out-of-distribution tasks....

Read Moreicon

UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

Published at 2025-09-29

#ML

The authors present a new method for creating detailed 3D portraits from unstructured, everyday photos without the need for special preparations or templates. Their approach, called UP2You, quickly and efficiently handles varying poses, viewpoints, and occlusions, outperforming previous methods in both accuracy and texture fidelity, making it practical for real-world applications....

Read Moreicon

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

Published at 2025-09-30

#ML

The study presents OmniRetarget, a data generation engine that creates realistic humanoid robot motions by preserving interactions with the environment and objects. This engine improves upon existing methods by minimizing physical artifacts and enabling efficient data augmentation, resulting in high-quality data for training proprioceptive RL policies to execute complex skills on a Unitree G1 humanoid robot....

Read Moreicon

Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning

Published at 2025-10-02

#ML

This research proposes a new update rule, MINTO, that combines the stability of target networks and the speed of online networks in reinforcement learning. By using the minimum estimate between the target and online networks, MINTO ensures faster, stable learning and can be easily integrated into various RL algorithms, consistently improving performance across different benchmarks....

Read Moreicon

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Published at 2025-10-03

#ML

This study explores the issue of diminishing exploration in Reinforcement Learning with Verifiable Rewards (RLVR) due to the loss of valuable, low-probability tokens, which the authors call 'reasoning sparks'. To tackle this problem, they propose Low-probability Regularization (Lp-Reg), a method that creates a less-noisy proxy distribution to protect these reasoning sparks, enabling stable training and leading to improved performance on math benchmarks....

Read Moreicon

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction

Published at 2025-10-03

#ML

This research presents a solution to improve Text-to-Sounding-Video generation, a task that creates synchronized audio and video from text. The proposed method uses a new framework to create separate captions for video and audio, reducing confusion, and a dual-tower diffusion transformer to facilitate better cross-modal feature interaction, resulting in state-of-the-art performance....

Read Moreicon

Towards Scalable and Consistent 3D Editing

Published at 2025-10-03

#ML

This study presents a new method for 3D editing that ensures consistency and structural integrity, addressing the limitations of previous techniques. The researchers created a large dataset of 3D editing examples and developed a transformer model that can make precise and consistent edits without requiring additional 3D masks, outperforming existing methods in experiments....

Read Moreicon

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

Published at 2025-10-04

#ML

The researchers created a new, large-scale benchmark called UniDoc-Bench to test multimodal retrieval-augmented generation systems, which use both text and images from real-world documents. They found that systems that combine both text and image information perform better than those that only use one or the other, and they provide guidance for improving these systems in the future....

Read Moreicon

Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

Published at 2025-10-07

#ML

This study combines driving models and generative world models to assess the realism of generated videos using E2E drivers and investigate distribution gaps affecting E2E planner performance. The research demonstrates that synthetic data from video generation models can effectively enhance E2E model generalization, providing a cheaper alternative to real-world data collection for expanding autonomous vehicle services....

Read Moreicon

DreamOmni2: Multimodal Instruction-based Editing and Generation

Published at 2025-10-08

#ML

The authors propose two new tasks, multimodal instruction-based editing and generation, that use both text and image instructions and can work with both concrete and abstract concepts. They introduce DreamOmni2, a model that addresses challenges in data creation and framework design, and achieves impressive results....

Read Moreicon

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

Published at 2025-10-08

#ML

The study presents GyroSwin, a scalable 5D neural surrogate that accurately models plasma turbulence in nuclear fusion, capturing nonlinear effects neglected by traditional reduced-order models while significantly reducing computational cost....

Read Moreicon

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Published at 2025-10-08

#ML

The study presents HERO, a framework that combines verifier signals and reward-model scores for reinforcement learning, which outperforms other methods by balancing stability and nuance in mathematical reasoning tasks....

Read Moreicon

Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

Published at 2025-10-08

#ML

This study presents BaRP, an approach that trains LLM routers under the same conditions as real-world deployment, allowing for adjustable performance and cost trade-offs without needing retraining. Experiments show that BaRP significantly outperforms other methods, improving upon the best offline router by at least 12.46% and the largest LLM by at least 2.45%....

Read Moreicon

LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling

Published at 2025-10-08

#ML

The authors present Long-RewardBench, a benchmark for evaluating reward models in long-context scenarios, and introduce a training strategy to create robust long-context reward models (LongRMs) that improve performance while maintaining short-context capability. Their 8B LongRM outperforms larger models and matches the performance of a proprietary model....

Read Moreicon

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

Published at 2025-10-08

#ML

The study presents NewtonBench, a benchmark for testing AI models' ability to discover scientific laws across 12 physics domains. Unlike existing benchmarks, NewtonBench offers a realistic and challenging evaluation by requiring AI models to experimentally explore complex systems, revealing a clear but fragile capability for discovery in advanced models....

Read Moreicon

PickStyle: Video-to-Video Style Transfer with Context-Style Adapters

Published at 2025-10-08

#ML

The authors present a method for transferring style from images to videos, addressing the challenge of lack of training data by using paired still image data and constructing synthetic video clips. Their approach, PickStyle, uses low-rank adapters in diffusion models and a new guidance technique called CS-CFG to ensure that the video's context is preserved while the style is effectively transferred, resulting in high-quality style-faithful video translations....

Read Moreicon

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

Published at 2025-10-08

#ML

The researchers developed a new framework called Search-R3 that improves the use of Large Language Models (LLMs) for retrieval tasks by having them generate search embeddings as part of their reasoning process. They achieved this through three methods: supervised learning, reinforcement learning for embedding optimization, and a specialized RL environment for efficient training, resulting in significant performance improvements on various benchmarks....

Read Moreicon

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Published at 2025-10-08

#ML

This study presents a new method, Thought Template Augmented LCLMs (ToTAL), to improve the reasoning ability of Long-Context Language Models (LCLMs). By using thought templates derived from previous problem-solving traces, the approach structures how evidence is combined and guides multi-hop inference, resulting in consistent improvements across various benchmarks and LCLM families, and even enabling distillation into smaller open-source models for broader applicability....

Read Moreicon

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Published at 2025-10-09

#ML

The authors present ARTDECO, a new framework that balances the speed of feed-forward models with the accuracy of SLAM-based pipelines for real-time 3D reconstruction from images. By utilizing 3D foundation models and a hierarchical Gaussian representation, ARTDECO achieves high-quality reconstructions with interactive performance, making it a promising solution for applications like AR/VR and robotics....

Read Moreicon

A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning

Published at 2025-10-09

#ML

The study presents A^2Search, a new framework for question answering that can handle multiple valid answers without relying on manual annotation, which is particularly useful for complex datasets. This method uses an automated pipeline to detect ambiguous questions and gather alternative answers, then optimizes the model with reinforcement learning, achieving state-of-the-art performance on various benchmarks....

Read Moreicon

Agent Learning via Early Experience

Published at 2025-10-09

#ML

The study proposes a new approach called 'early experience' to help language agents learn and improve from their own actions, rather than relying solely on expert demonstrations. This method involves using the agent's future states as supervision without reward signals, and it is evaluated across various environments. The results show that early experience improves effectiveness and generalization, and it can serve as a foundation for reinforcement learning, bridging the gap between imitation le...

Read Moreicon

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Published at 2025-10-09

#ML

The authors present DeepMiner, a new framework that enhances deep reasoning capabilities in multi-turn agents by generating complex training tasks and implementing a dynamic context window. DeepMiner-32B, developed using this framework, significantly outperforms previous open-source agents in various search agent benchmarks, including BrowseComp-en, while effectively managing long-horizon contexts....

Read Moreicon

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

Published at 2025-10-09

#ML

This study presents a new framework called CoMAS, which allows language model-based agents to improve their skills through interaction with other agents, without needing external rewards or supervision. Experiments show that CoMAS performs better than non-trained agents and is effective in various scenarios, with the performance getting better as more diverse agents are added....

Read Moreicon

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Published at 2025-10-09

#ML

The research presents DeepPrune, a new framework that improves the efficiency of parallel scaling in large language models by reducing inter-trace redundancy through dynamic pruning, resulting in over 80% token reduction with minimal loss in accuracy....

Read Moreicon

DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model

Published at 2025-10-09

#ML

The study presents a new method to help robots rotate a variety of objects in their grasp, closing the gap between how well they can do this in simulations versus the real world. This is achieved by training a policy in simulation and then using a joint-wise dynamics model to adapt it to real-world conditions, requiring minimal real-world data and demonstrating impressive generality and robustness in real-world tests....

Read Moreicon

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

Published at 2025-10-09

#ML

A new method called ERA is presented, which enhances the performance of large language models, continuous control reinforcement learning agents, and image classification by controlling the entropy of model outputs through specially designed activations. This approach significantly improves results in various domains with minimal computational overhead....

Read Moreicon

First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Published at 2025-10-09

#ML

The study examines how language models use reflections during reasoning and finds that reflections mostly confirm initial answers without changing them. The researchers then suggest a new method to improve the efficiency of the reasoning process by stopping it once plausible answers are generated, which reduces unnecessary reflection steps and enhances token efficiency....

Read Moreicon

GCPO: When Contrast Fails, Go Gold

Published at 2025-10-09

#ML

The paper presents Group Contrastive Policy Optimization (GCPO), a method that uses external reference answers to improve the reasoning capabilities of language models. GCPO enhances training efficiency and generalization by providing correct responses when the model fails, leading to significant improvements over the baseline model in various benchmark datasets....

Read Moreicon

InstructX: Towards Unified Visual Editing with MLLM Guidance

Published at 2025-10-09

#ML

The authors propose InstructX, a single framework for editing images and videos, by integrating Multimodal Large Language Models (MLLMs) with diffusion models. They demonstrate that training on image data can enable video editing capabilities and that their approach can handle a wide range of editing tasks, achieving top performance....

Read Moreicon

LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions

Published at 2025-10-09

#ML

The study explores how language models can unintentionally become dishonest when trained with misaligned data or interacting with biased users, even in small amounts, leading to broader misalignment in their behavior....

Read Moreicon

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

Published at 2025-10-09

#ML

The authors present a new method to improve the efficiency of continuous-time consistency models for large-scale image and video generation. They introduce a parallelism-compatible kernel for training these models on bigger tasks and propose a score-regularized model that enhances visual quality and diversity, outperforming existing methods without requiring additional tuning or hyperparameter searches....

Read Moreicon

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

Published at 2025-10-09

#ML

The authors present a new agent framework called MUSE that can learn and improve from experience for long-term tasks, unlike existing AI agents that remain static at test time. They demonstrate its superior performance on the TAC benchmark using a lightweight model, and show that it can generalize to new tasks without additional training....

Read Moreicon

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Published at 2025-10-09

#ML

This study explores the ability of multimodal large language models (MLLMs) to perform long-chain reflective reasoning, a crucial skill for solving complex real-world problems. The researchers create a new benchmark with 1,260 challenging tasks and find that existing MLLMs struggle with this type of reasoning. They then introduce a new training strategy called Adaptive Hybrid Policy Optimization (AHPO) that helps models learn and generalize reflective reasoning, resulting in significant performa...

Read Moreicon

Memory Retrieval and Consolidation in Large Language Models through Function Tokens

Published at 2025-10-09

#ML

This study suggests that in large language models, certain tokens (like punctuation, articles, etc.) help recall information during use and store new data during learning, enhancing the model's knowledge and reasoning abilities. The researchers provide evidence supporting this idea, showing that these tokens play a crucial role in both remembering and learning new information....

Read Moreicon

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

Published at 2025-10-09

#ML

The authors study how to train Multimodal Large Language Models (MLLMs) in an end-to-end manner and find the best balance between performance and cost. They propose a new MLLM called NaViL, which performs well on various benchmarks and offers valuable insights for future research on native MLLMs....

Read Moreicon

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

Published at 2025-10-09

#ML

The authors present a new framework called R2RGen that generates real-world 3D data for robotic manipulation without relying on simulations. This method enhances data efficiency and is suitable for mobile manipulation tasks by using a single source demonstration and an annotation mechanism to create diverse and complex scenarios, while ensuring the generated data matches real-world sensor distributions....

Read Moreicon

Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training

Published at 2025-10-09

#ML

The authors propose a method to recycle existing Large Language Model checkpoints by expanding their parameters and continuing training, which proves to be more efficient and cost-effective. Experiments show that using more 'sunk' cost (pretrained checkpoints) leads to better performance, and their approach achieves a significant accuracy gain over training from scratch....

Read Moreicon

Reinforcing Diffusion Models by Direct Group Preference Optimization

Published at 2025-10-09

#ML

The study presents a new method called Direct Group Preference Optimization (DGPO) that improves the training speed of diffusion models by 20 times compared to existing techniques. DGPO directly learns from group-level preferences, eliminating the need for inefficient stochastic policies and enabling the use of faster deterministic ODE samplers, resulting in superior performance on various reward metrics....

Read Moreicon

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Published at 2025-10-09

#ML

The authors propose a framework called SViM3D that uses video diffusion models to create 3D objects from a single image, while also generating realistic materials and surface normals. This allows for relighting and editing of the 3D object, and the method has been shown to outperform existing techniques on various datasets....

Read Moreicon

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

Published at 2025-10-09

#ML

SciVideoBench is a new test for advanced video reasoning in scientific contexts, which reveals that current AI models struggle with it, suggesting room for improvement in AI's ability to understand and reason with scientific videos....

Read Moreicon

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Published at 2025-10-09

#ML

The authors present WaltzRL, a new multi-agent reinforcement learning framework that improves LLM safety by having two agents collaborate: one for conversation and another for providing feedback. This method reduces unsafe responses and overrefusals while maintaining helpfulness and low latency, as demonstrated in experiments across five datasets....

Read Moreicon

Training-Free Group Relative Policy Optimization

Published at 2025-10-09

#ML

The study presents a new method called Training-Free GRPO, which enhances the performance of Large Language Models (LLMs) in specialized domains without making any changes to the model's parameters. This approach uses experiential knowledge as a token prior, which is learned from a small amount of data and integrated into the LLM to improve its behavior. The method has been tested and shown to outperform fine-tuned small LLMs in mathematical reasoning and web searching tasks....

Read Moreicon

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

Published at 2025-10-09

#ML

The study presents UniMMVSR, a new framework that improves video super-resolution by using multiple types of inputs like text, images, and videos, which helps create more detailed and accurate videos compared to existing methods. The framework is designed to work with a latent video diffusion model and can generate 4K videos with multi-modal guidance, which was not possible before....

Read Moreicon

UniVideo: Unified Understanding, Generation, and Editing for Videos

Published at 2025-10-09

#ML

The researchers developed a new framework called UniVideo that can understand, generate, and edit videos. It can follow complex instructions, maintain visual consistency, and handle various tasks like text/image-to-video generation and editing. Moreover, it can generalize to new tasks and even edit videos without explicit training, using its knowledge from image editing....

Read Moreicon

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Published at 2025-10-09

#ML

The study presents a new method called VideoCanvas that allows users to generate videos by specifying arbitrary patches in any spatial location and time, similar to painting on a video canvas. This approach unifies various controllable video generation tasks and addresses the challenge of precise frame-level conditioning in modern latent video diffusion models by proposing a hybrid conditioning strategy, resulting in significant performance improvement over existing methods....

Read Moreicon

Published at

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages