🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
Efficient Agents: Building Effective Agents While Reducing Cost |
Published at 2025-07-24 |
#ML
|
This study explores the balance between cost and performance in AI-driven agent systems, focusing on three main questions about task complexity, module returns, and efficient frameworks. The research introduces Efficient Agents, a new framework that maintains high performance while cutting operational costs by 28.4% compared to a leading open-source agent framework.... |
Read More |
|
|
![]() |
Reasoning Language Models for Root Cause Analysis in 5G Wireless Networks |
Published at 2025-07-29 |
#ML
|
The authors present a new method to improve root cause analysis in 5G wireless networks using large language models. They created a dataset of annotated troubleshooting problems and developed a two-stage training method to enhance the models' accuracy and reasoning quality, resulting in better performance compared to existing models.... |
Read More |
|
|
|
![]() |
FACTORY: A Challenging Human-Verified Prompt Set for Long-Form Factuality |
Published at 2025-07-31 |
#ML
|
FACTORY is a new, large-scale benchmark for testing the accuracy of language models in generating long responses. It's more challenging than other benchmarks because it's human-verified and focuses on factuality, and it reveals that many state-of-the-art models struggle with providing accurate, fact-based responses.... |
Read More |
|
|
![]() |
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis |
Published at 2025-07-31 |
#ML
|
The authors propose a new method for creating high-quality 3D animations from videos, addressing challenges in data construction and high-dimensional representation. Their approach, Direct 4DMesh-to-GS Variation Field VAE, efficiently encodes 3D shapes, appearance, and motion, and trains a Gaussian Variation Field diffusion model with a Diffusion Transformer. This model outperforms existing methods and generalizes well to real-world video inputs, enabling the generation of high-quality animated ... |
Read More |
|
|
|
![]() |
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization |
Published at 2025-07-31 |
#ML
|
The study presents RL-PLUS, a new method for enhancing the reasoning abilities of large language models in reinforcement learning. RL-PLUS combines internal exploitation with external data, addressing the distributional mismatch and guiding the model towards high-value, unexplored paths. Results show that RL-PLUS outperforms existing methods, achieving up to 69.2% improvement and resolving the capability boundary collapse problem.... |
Read More |
|
|
![]() |
The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models |
Published at 2025-07-31 |
#ML
|
This study examines how AI models create art by separating content and style, using heatmaps to see which parts of an image are influenced by specific prompts. The research reveals that these models can understand and apply the distinction between content and style in different artistic prompts, providing insights into how AI learns complex artistic concepts without explicit guidance.... |
Read More |
|
|
|
![]() |
DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior |
Published at 2025-08-01 |
#ML
|
The study develops DPoser-X, a diffusion model-based approach for accurately predicting 3D whole-body human poses, which is more versatile and robust than existing methods. It introduces new techniques like truncated timestep scheduling and masked training, which improve performance on various pose-related tasks by effectively combining different datasets and capturing interdependencies between body parts.... |
Read More |
|
|
![]() |
Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation |
Published at 2025-08-01 |
#ML
|
The study presents Sel3DCraft, a system that improves the process of creating 3D models from text by introducing a visual prompt engineering method. This method guides users through a structured process, offering diverse candidate models, evaluating them with advanced metrics, and allowing for intuitive refinement, ultimately enhancing the creativity and efficiency of designers in the text-to-3D generation process.... |
Read More |
|
|
|
![]() |
A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding |
Published at 2025-08-02 |
#ML
|
The study presents a new benchmark for accurately identifying objects in 3D outdoor scenes using natural language descriptions, going beyond traditional bounding boxes. They also propose GroundingOcc, a model that uses visual, textual, and point cloud data to precisely locate and describe objects, outperforming existing methods.... |
Read More |
|
|
![]() |
C3D-AD: Toward Continual 3D Anomaly Detection via Kernel Attention with Learnable Advisor |
Published at 2025-08-02 |
#ML
|
The study presents a new framework called Continual 3D Anomaly Detection (C3D-AD) that can learn from various 3D object types and adapt to new ones over time. This method uses attention mechanisms and a special loss function to efficiently extract features, reconstruct data, and maintain consistency, resulting in improved performance on several public datasets.... |
Read More |
|
|
|
![]() |
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens |
Published at 2025-08-02 |
#ML
|
This study examines the reasoning ability of Large Language Models (LLMs) using a method called Chain-of-Thought (CoT) prompting, which makes the models appear to think like humans. The researchers found that this reasoning is actually based on patterns learned during training and can fail when faced with new or unfamiliar situations, revealing the limitations of current LLMs in performing genuine and generalizable reasoning.... |
Read More |
|
|
![]() |
DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion |
Published at 2025-08-03 |
#ML
|
The authors present a new approach called DiffSemanticFusion that combines the strengths of raster-based and graph-based representations for online HD map generation in autonomous driving. Their method improves the stability and expressiveness of online HD map representations, leading to better performance in trajectory prediction and end-to-end autonomous driving tasks compared to state-of-the-art methods.... |
Read More |
|
|
|
![]() |
IAUNet: Instance-Aware U-Net |
Published at 2025-08-03 |
#ML
|
The study presents IAUNet, a new query-based U-Net architecture designed to improve efficiency and performance in biomedical image segmentation. The model includes a lightweight convolutional Pixel decoder and a Transformer decoder for refining object-specific features, and it sets a new benchmark using the 2025 Revvity Full Cell Segmentation Dataset, outperforming many current segmentation models.... |
Read More |
|
|
![]() |
OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets |
Published at 2025-08-03 |
#ML
|
The study presents OpenMed NER, a collection of open-source, domain-adapted transformer models for biomedical named-entity recognition. These models utilize lightweight domain-adaptive pre-training and parameter-efficient Low-Rank Adaptation, achieving state-of-the-art performance on 10 out of 12 biomedical NER benchmarks while maintaining computational efficiency.... |
Read More |
|
|
|
![]() |
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents |
Published at 2025-08-03 |
#ML
|
The authors present a framework for web agents to learn knowledge and engage in cognitive reasoning, which they break down into two stages: knowledge content learning and cognitive processes. They create a dataset to help agents acquire essential knowledge and introduce a new model, Web-CogReasoner, which outperforms existing models, especially in tasks requiring structured knowledge.... |
Read More |
|
|
![]() |
DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework |
Published at 2025-08-04 |
#ML
|
The researchers present a new method called DreamVVT that improves the realism of virtual try-on videos in uncontrolled environments. It uses two stages: first, it creates high-quality try-on images from a video using a vision-language model, then it uses these images along with motion data to generate a realistic try-on video, resulting in better detail preservation and smoother motion compared to existing methods.... |
Read More |
|
|
|
![]() |
LeanK: Learnable K Cache Channel Pruning for Efficient Decoding |
Published at 2025-08-04 |
#ML
|
The authors present a method called LeanK that optimizes the efficiency of large language models by selectively removing less important components in the key-value cache, thereby reducing memory usage and accelerating decoding without compromising accuracy. Experimental results show significant memory savings and improved performance, along with insights into model channels and attention heads during long-context inference.... |
Read More |
|
|
![]() |
Agent Lightning: Train ANY AI Agents with Reinforcement Learning |
Published at 2025-08-05 |
#ML
|
Agent Lightning is a new framework that allows for training AI agents using Reinforcement Learning without requiring changes to the existing agent code. It offers a unified data interface and a hierarchical RL algorithm, making it possible to handle complex interactions and improve agent performance in real-world tasks.... |
Read More |
|
|
|
![]() |
CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction |
Published at 2025-08-05 |
#ML
|
The authors present CoTox, a new framework that uses a large language model with step-by-step reasoning to predict multiple types of drug toxicity. By integrating chemical structure data, biological pathways, and gene ontology terms, CoTox generates interpretable predictions and outperforms traditional machine learning and deep learning models. The framework can be improved by using easier-to-understand chemical representations and incorporating biological context for more accurate predictions.... |
Read More |
|
|
![]() |
Data and AI governance: Promoting equity, ethics, and fairness in large language models |
Published at 2025-08-05 |
#ML
|
The authors discuss a practical approach to govern, assess, and quantify bias in machine learning models, specifically focusing on large language models (LLMs). This approach covers the entire life cycle of AI development, from creation to production, to ensure equity, ethics, and fairness, and to mitigate risks of discrimination and reputational harm in real-world applications.... |
Read More |
|
|
|
![]() |
HPSv3: Towards Wide-Spectrum Human Preference Score |
Published at 2025-08-05 |
#ML
|
The study presents HPSv3, a new human preference score for evaluating text-to-image generation models, which includes a wide-spectrum human preference dataset of 1.08M text-image pairs and 1.17M annotated comparisons. They also introduce CoHP, a method to improve image generation quality without extra data, and demonstrate its efficiency and alignment with human perception through extensive experiments.... |
Read More |
|
|
![]() |
HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization |
Published at 2025-08-05 |
#ML
|
The study presents HarmonyGuard, a framework designed to enhance the safety and efficiency of web agents. It achieves this by automatically updating security policies and optimizing safety and utility through real-time reasoning, resulting in significant improvements over existing methods.... |
Read More |
|
|
|
![]() |
LaTCoder: Converting Webpage Design to Code with Layout-as-Thought |
Published at 2025-08-05 |
#ML
|
The study presents a new method called LaTCoder that improves the accuracy of converting webpage designs into code by using a human-like reasoning process. This approach, which is more effective than existing methods, generates code for each block of a webpage design and then assembles them together, resulting in better layout preservation and higher-quality webpages.... |
Read More |
|
|
![]() |
Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following |
Published at 2025-08-05 |
#ML
|
The study identifies lazy reasoning as the main cause of LLMs failing to follow complex instructions and proposes a framework called Light-IF that uses preview and self-checking to improve reasoning, resulting in better instruction adherence and superior performance on various models.... |
Read More |
|
|
|
![]() |
MiDashengLM: Efficient Audio Understanding with General Audio Captions |
Published at 2025-08-05 |
#ML
|
The study presents MiDashengLM, an open audio-language model that uses general audio captions for efficient and comprehensive audio understanding, avoiding reliance on closed data sources or proprietary models. It integrates an open-source audio encoder called Dasheng, focusing on a holistic textual representation of complex audio scenes, and offers significant improvements in speed and throughput compared to similar models.... |
Read More |
|
|
![]() |
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering |
Published at 2025-08-05 |
#ML
|
The authors present SonicMaster, a unified model for music restoration and mastering that uses natural language instructions to improve audio quality, and introduce the SonicMaster dataset for training it, which includes paired degraded and high-quality tracks with various common degradation types.... |
Read More |
|
|
|
![]() |
Sotopia-RL: Reward Design for Social Intelligence |
Published at 2025-08-05 |
#ML
|
The study presents a new framework called Sotopia-RL to improve social intelligence in large language models through reinforcement learning, addressing challenges of partial observability and multi-dimensionality in social interactions by refining rewards at the utterance level and in multiple dimensions, resulting in superior performance in social goal completion compared to existing methods.... |
Read More |
|
|
![]() |
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning |
Published at 2025-08-05 |
#ML
|
The study presents a method to apply reinforcement learning to software engineering tasks, which typically require complex, multi-step interactions. The researchers used a modified algorithm to train an agent on real-world software engineering tasks, improving its success rate significantly compared to a baseline model, without using any advanced teacher models.... |
Read More |
|
|
|
![]() |
VeriGUI: Verifiable Long-Chain GUI Dataset |
Published at 2025-08-05 |
#ML
|
The study presents VeriGUI, a new dataset for testing advanced computer programs that can interact with complex graphical user interfaces over long periods. This dataset includes tasks that can be broken down into smaller, verifiable steps and is designed to help improve the development of generalist GUI agents for real-world applications.... |
Read More |
|
|
![]() |
EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation |
Published at 2025-08-06 |
#ML
|
EvoC2Rust is a new framework that translates C projects to Rust while maintaining safety and idiomaticity. It uses a skeleton-guided strategy and outperforms existing methods in syntax, semantics, and safety, even for complex codebases.... |
Read More |
|
|
|
![]() |
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success |
Published at 2025-08-06 |
#ML
|
The study presents VL-DAC, a new reinforcement learning algorithm that enhances vision-language models by training them in synthetic worlds. This method allows for faster and more reliable convergence, resulting in improved performance on various real-world tasks without sacrificing image understanding accuracy.... |
Read More |
|
|
![]() |
IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards |
Published at 2025-08-06 |
#ML
|
The Instruction Following Decorator (IFDecorator) is a new framework that enhances the efficiency and accuracy of Reinforcement Learning with Verifiable Rewards (RLVR) for large language models (LLMs). It includes tools to create challenging instruction-verification pairs, ensure the LLM follows user intent, and detect shortcut exploitation behaviors, resulting in improved performance on various benchmarks.... |
Read More |
|
|
|
![]() |
Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference |
Published at 2025-08-06 |
#ML
|
The current centralized AI conference model is facing a crisis due to rapid expansion, leading to increased publication rates, high carbon footprint, negative community sentiment, and logistical challenges. A new Community-Federated Conference model is proposed to address these issues, promoting sustainability, inclusivity, and resilience in AI research.... |
Read More |
|
|
![]() |
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience |
Published at 2025-08-06 |
#ML
|
The researchers developed SEAgent, a framework that allows computer use agents to learn and improve on their own when interacting with new software. SEAgent helps agents explore, learn from mistakes, and tackle increasingly difficult tasks, resulting in a stronger, generalist agent that outperforms specialized agents in five novel software environments.... |
Read More |
|
|
|
![]() |
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management |
Published at 2025-08-06 |
#ML
|
The research proposes a new framework called Sculptor that helps Large Language Models (LLMs) manage their internal memory more effectively by giving them tools to focus on relevant information and ignore distractions, improving their performance in processing long contexts without specific training.... |
Read More |
|
|
![]() |
StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion |
Published at 2025-08-06 |
#ML
|
The study presents ThinkingF, a training pipeline that enhances two essential abilities for autoformalization: formal language domain knowledge and natural language reasoning. By creating two datasets and using them for supervised fine-tuning and reinforcement learning, the resulting StepFun-Formalizer models demonstrate superior performance in converting natural-language mathematical statements into formal languages, setting new state-of-the-art records.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|