🤗 Daily Paper(2025-08-14)

5 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 14, 2025, 4:07:22 PMAug 14

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning

Published at 2025-08-02

#ML

A new method using multi-agent reinforcement learning allows a team of small flying robots to manipulate a cable-suspended load in real-world 6-DoF without needing information from other robots or a centralized control system, enabling scalability and reducing computing costs for onboard deployment, which is validated in various real-world experiments....

Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

Published at 2025-08-07

#ML

The authors present a new reinforcement learning framework called Cooper that improves large language models' reasoning abilities by jointly optimizing policy and reward models. This approach enhances robustness, reduces reward hacking, and improves overall performance, as demonstrated by their experiments....

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

Published at 2025-08-08

#ML

This study presents a new method called discrete diffusion forcing that enhances the inference speed of diffusion Large Language Models, making them up to 50 times faster than existing models without compromising quality....

MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models

Published at 2025-08-08

#ML

The study presents MathReal, a new dataset of 2000 math questions with real-life images taken by mobile phones, to evaluate the math reasoning abilities of multimodal large language models in authentic educational settings. Experimental results show that these models struggle with realistic scenarios, providing insights for future improvements....

AMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration Balance

Published at 2025-08-09

#ML

This study proposes a new method called Adaptive Meta Fine-Tuning (AMFT) that improves the reasoning abilities of Large Language Models (LLMs) by balancing two types of rewards: one from supervised learning and another from reinforcement learning. AMFT dynamically adjusts this balance to optimize long-term performance, leading to better results on various challenging tasks and out-of-distribution scenarios compared to existing methods....

CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

Published at 2025-08-09

#ML

The study presents CannyEdit, a new method for training-free image editing that maintains the balance between edited and unedited parts of an image. It uses Selective Canny Control to precisely edit specific regions while preserving the rest of the image, and Dual-Prompt Guidance to ensure the edited object fits seamlessly into the scene, outperforming other methods in both aspects....

ASM-UNet: Adaptive Scan Mamba Integrating Group Commonalities and Individual Variations for Fine-Grained Segmentation

Published at 2025-08-10

#ML

The authors present a new model, ASM-UNet, for accurately identifying small-scale anatomical structures in medical images, which is challenging due to individual variations. Unlike other models, ASM-UNet uses adaptive scan scores to adjust its scanning order based on both common and unique features, resulting in better performance in both large-scale and fine-grained segmentation tasks....

Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

Published at 2025-08-11

#ML

The authors present GRAO, a unified framework that combines supervised fine-tuning and reinforcement learning for improving language model alignment. GRAO's three key innovations enable comparative quality assessment, leverage intra-group relative advantage weighting, and guide parameter updates using pairwise preference dynamics, resulting in superior performance and efficiency compared to traditional methods....

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Published at 2025-08-11

#ML

The authors present Mol-R1, a framework that enhances the reasoning ability of language models in molecule discovery by creating a high-quality reasoning dataset and a new training strategy called MoIA, which improves performance over existing models....

Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Published at 2025-08-11

#ML

The authors present a new, lightweight framework called Stand-In for creating high-quality videos with specific identities. It's easy to integrate with other AI tools and only requires training a small portion of additional parameters, yet it outperforms other methods in video quality and identity preservation....

IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding

Published at 2025-08-12

#ML

This study presents a new method called IAG to manipulate vision-language models in visual grounding tasks, making them focus on a specific target object regardless of the query. The attack is stealthy and effective, achieving over 65% accuracy on a large model and maintaining high performance on other models with minimal accuracy loss....

AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving

Published at 2025-08-13

#ML

This study presents a new Multi-Agent System architecture called AWorld that enhances agent stability when using multiple tools to solve complex problems. The system uses dynamic supervision and maneuvering mechanisms, where an Execution Agent collaborates with a Guard Agent to reduce errors and improve problem-solving robustness. Experiments show that this dynamic MAS system significantly outperforms single-agent and standard tool-augmented systems in terms of effectiveness and stability, earni...

Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study

Published at 2025-08-13

#ML

This study presents an automated framework that uses advanced language models to generate explanations for NLP tasks, which are then evaluated and compared to human-generated explanations. The results show that these automated explanations can significantly improve model performance, offering a cost-effective and scalable alternative to manual annotation....

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Published at 2025-08-13

#ML

The study presents Echo-4o, an image generation model trained with a large dataset of synthetic images from GPT-4o. These synthetic images offer advantages over real-world data, such as complementing rare scenarios and providing clean supervision. The model demonstrates strong performance and can improve other foundation models when applied....

GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

Published at 2025-08-13

#ML

The authors present GSFixer, a new method that enhances the quality of 3D scene reconstructions from sparse views using 3D Gaussian Splatting. They address the issue of inconsistent content generation in previous approaches by integrating both 2D and 3D features from reference views, leading to improved semantic coherence and 3D consistency. Additionally, they introduce a new benchmark called DL3DV-Res for evaluating 3D Gaussian Splatting artifact restoration, and their method outperforms existi...

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Published at 2025-08-13

#ML

The authors present a method to improve the performance of models during inference without significantly increasing computation time by using a Noise Hypernetwork to replace reward-guided test-time noise optimization in diffusion models, resulting in substantial quality gains at a lower computational cost....

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Published at 2025-08-13

#ML

The authors present a new framework for multimodal agents that can process visual and auditory inputs in real-time, build long-term memory, and perform multi-step reasoning. They also introduce a new benchmark for evaluating these agents and demonstrate that their approach outperforms existing methods....

Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Published at 2025-08-13

#ML

The authors propose Story2Board, a new method for creating expressive storyboards from text without requiring training. This approach focuses on improving spatial composition, background evolution, and narrative pacing in storyboard generation by introducing two new mechanisms: Latent Panel Anchoring and Reciprocal Attention Value Mixing. These mechanisms help generate more coherent and visually diverse storyboards compared to existing methods....

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

Published at 2025-08-13

#ML

The authors present VisCodex, a unified framework that combines vision and coding language models to enhance multimodal code generation in large language models. They introduce a new dataset, MCD, and a benchmark, InfiBench-V, to evaluate models on visually-rich programming questions, demonstrating VisCodex's strong performance compared to other models....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages