🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge |
Published at 2025-09-07 |
#ML
|
The authors present a new framework that helps AI models understand both images and text better, which won first place in a challenging AI competition focused on math and physics. This method was also tested and worked well on other tasks, proving its usefulness in various situations.... |
Read More |
|
|
![]() |
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation |
Published at 2025-09-12 |
#ML
|
This study introduces a new framework called Stable Part Diffusion 4D that generates both colored images and structural component videos from a single input. It uses a unique encoding method to simplify the model and allows for flexible part counts, while also ensuring consistency between the generated components. The framework is trained and tested on a large dataset of rigged objects and demonstrates strong generalization to various scenarios, making it useful for animation and motion-related ... |
Read More |
|
|
|
![]() |
Struct-Bench: A Benchmark for Differentially Private Structured Text Generation |
Published at 2025-09-12 |
#ML
|
Struct-Bench is a framework and benchmark for evaluating differentially private synthetic structured text data, which includes 5 real-world and 2 synthetically generated datasets annotated with CFGs. The benchmark helps researchers compare and improve privacy-preserving synthetic data generation methods and is publicly available at https://struct-bench.github.io.... |
Read More |
|
|
![]() |
Multiple Instance Learning Framework with Masked Hard Instance Mining for Gigapixel Histopathology Image Analysis |
Published at 2025-09-14 |
#ML
|
This study presents a new learning method that focuses on challenging instances within large pathology images to improve accuracy in computational pathology tasks like cancer diagnosis and survival analysis. The proposed framework uses a Siamese structure with a consistency constraint to explore hard instances, a momentum teacher to mask salient instances and mine hard ones, and a global recycle network to mitigate the risk of losing key features. Experimental results show that this method outpe... |
Read More |
|
|
|
![]() |
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs |
Published at 2025-09-14 |
#ML
|
This study presents a new method called Optimal Brain Restoration (OBR) that combines quantization and sparsity for compressing Large Language Models (LLMs). OBR addresses the conflicting requirements of these techniques by using a training-free framework that minimizes performance loss, resulting in faster and more memory-efficient LLMs.... |
Read More |
|
|
![]() |
RAPTOR: A Foundation Policy for Quadrotor Control |
Published at 2025-09-14 |
#ML
|
This study presents RAPTOR, a method for training a generalized neural-network policy to control various quadrotors. The policy, trained using a novel Meta-Imitation Learning algorithm, adapts quickly to new, unseen quadrotors through In-Context Learning, demonstrating strong performance in diverse conditions.... |
Read More |
|
|
|
![]() |
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving |
Published at 2025-09-15 |
#ML
|
This study compares various test-time scaling strategies for automated theorem proving and proposes two new methods, a dynamic CoT switching mechanism and diverse parallel-scaled RL with trainable prefixes, to reduce computational cost while maintaining performance. The proposed EconProver model achieves comparable performance to baseline methods with only 12% of the computational cost.... |
Read More |
|
|
![]() |
Exact Coset Sampling for Quantum Lattice Algorithms |
Published at 2025-09-15 |
#ML
|
The authors propose a new method to fix a problem in a quantum lattice algorithm, specifically in Step 9, which has a mismatch issue. Their solution involves a pair-shift difference construction that cancels unknown offsets, creates an exact uniform state, and enforces the intended relation, all while preserving the algorithm's efficiency.... |
Read More |
|
|
|
![]() |
Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time |
Published at 2025-09-15 |
#ML
|
The research presents a new method called Preference Hijacking (Phi) that can manipulate the output preferences of Multimodal Large Language Models (MLLMs) using specially crafted images, without altering the models themselves. This technique, which works during inference and is hard to detect, can be used to influence MLLM responses towards any desired preference, as demonstrated by experimental results across various tasks.... |
Read More |
|
|
![]() |
zELO: ELO-inspired Training Method for Rerankers and Embedding Models |
Published at 2025-09-15 |
#ML
|
The study presents a new training method called zELO, which improves retrieval performance by treating ranking tasks as a Thurstone model. Using this method, they developed two reranker models, zerank-1 and zerank-1-small, that outperform existing models in various domains and maintain their performance across different datasets.... |
Read More |
|
|
|
![]() |
3D Aware Region Prompted Vision Language Model |
Published at 2025-09-16 |
#ML
|
The authors describe a new model that links 2D images and 3D data by using a shared visual token space, enabling flexible region prompting without exhaustive labeling. This model enhances 2D visual features with 3D positional embeddings, allowing for more accurate spatial reasoning across frames and achieving state-of-the-art performance on various benchmarks, even in real-world videos without 3D inputs or annotations.... |
Read More |
|
|
![]() |
Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation |
Published at 2025-09-16 |
#ML
|
Hunyuan3D Studio is a new AI-powered platform that simplifies and speeds up the process of creating high-quality 3D models for games. It uses advanced neural modules to automatically transform concept images or text into fully-realized, game-ready 3D models, reducing production time and making it easier for creators to bring their ideas to life.... |
Read More |
|
|
|
![]() |
ROOM: A Physics-Based Continuum Robot Simulator for Photorealistic Medical Datasets Generation |
Published at 2025-09-16 |
#ML
|
The authors present ROOM, a simulation framework for generating realistic training data for continuum robots used in bronchoscopy procedures, addressing the challenge of limited data availability due to ethical and safety concerns.... |
Read More |
|
|
![]() |
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization |
Published at 2025-09-16 |
#ML
|
The researchers present ReSum, a new approach that allows web agents to explore indefinitely by periodically summarizing context, thus overcoming the limitations of context window sizes in LLM-based web agents. They also propose ReSum-GRPO, which integrates with segmented trajectory training and advantage broadcasting, leading to significant performance improvements across various benchmarks and agent scales.... |
Read More |
|
|
|
![]() |
Scaling Agents via Continual Pre-training |
Published at 2025-09-16 |
#ML
|
The study presents a new method called Agentic Continual Pre-training to improve agentic systems, which can use tools and solve complex problems on their own. The method helps these systems learn various skills more effectively and is used to create a new model named AgentFounder, which outperforms existing models in various tasks, including browsing and problem-solving.... |
Read More |
|
|
![]() |
Single-stream Policy Optimization |
Published at 2025-09-16 |
#ML
|
The authors propose a new method called Single-stream Policy Optimization (SPO) to improve policy-gradient optimization for Large Language Models (LLMs). SPO addresses the flaws of existing group-based methods by using a persistent value tracker and global advantage normalization, leading to more stable learning and better performance on various math benchmarks.... |
Read More |
|
|
|
![]() |
Towards General Agentic Intelligence via Environment Scaling |
Published at 2025-09-16 |
#ML
|
The study presents a scalable framework for creating diverse, simulated environments to enhance general agentic intelligence, specifically function-calling capabilities, in Large Language Models. By fine-tuning agents in two phases, the proposed model, AgentScaler, significantly improves performance on various agentic benchmarks.... |
Read More |
|
|
![]() |
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents |
Published at 2025-09-16 |
#ML
|
The authors present a new framework called WebResearcher that uses an iterative deep-research process to help AI agents learn and adapt by consolidating information and maintaining focused workspaces. This approach significantly improves tool-use capabilities and enables parallel thinking for comprehensive conclusions, as demonstrated by state-of-the-art performance in extensive experiments across six challenging benchmarks.... |
Read More |
|
|
|
![]() |
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning |
Published at 2025-09-16 |
#ML
|
The study presents WebSailor, a methodology that enhances open-source models' ability to navigate complex information landscapes by reducing extreme uncertainty, thereby improving their performance on information-seeking tasks to match proprietary agents like DeepResearch.... |
Read More |
|
|
![]() |
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research |
Published at 2025-09-16 |
#ML
|
The authors present a new method called WebWeaver which uses two AI agents to improve the process of gathering and organizing large amounts of information from the web. This approach mimics how humans conduct research and helps to avoid common issues like losing track of information or creating inaccurate content. The new method outperforms existing methods in various tests, showing that its human-like approach is effective in producing high-quality, accurate, and well-structured reports.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|