🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation |
Published at 2025-07-18 |
#ML
|
DreamScene is a new framework that automatically creates high-quality, editable 3D scenes from text, using AI to plan the scene, arrange objects, and generate realistic geometry, while ensuring consistency and enabling detailed editing. It's a practical tool for creating open-domain 3D content, outperforming existing methods in quality, consistency, and flexibility.... |
Read More |
|
|
![]() |
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding |
Published at 2025-07-25 |
#ML
|
The authors present Step-3, a cost-efficient language model with hardware-aware design, which uses novel techniques to reduce computation and memory usage, and achieve high performance on long-context reasoning tasks.... |
Read More |
|
|
|
![]() |
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision |
Published at 2025-07-28 |
#ML
|
This study presents a novel method using generative AI to create high-quality aerial images and labels, which helps train vehicle detectors more effectively by reducing the performance gap between different geographic regions. The proposed multi-stage, multi-modal knowledge transfer framework improves detection accuracy by 4-23%, 6-10%, 7-40%, and over 50% compared to other methods, and introduces new annotated aerial datasets from New Zealand and Utah.... |
Read More |
|
|
![]() |
BANG: Dividing 3D Assets via Generative Exploded Dynamics |
Published at 2025-07-29 |
#ML
|
This study presents a new method called BANG that makes 3D object creation more intuitive and less labor-intensive. By using a specialized model and attention system, BANG allows users to easily decompose and reassemble 3D objects with precise control, making the design process more like how humans naturally think and work.... |
Read More |
|
|
|
![]() |
MetaCLIP 2: A Worldwide Scaling Recipe |
Published at 2025-07-29 |
#ML
|
The authors propose MetaCLIP 2, a new method for training the CLIP model on global web data, addressing challenges like non-English data and the 'curse of multilinguality'. MetaCLIP 2 outperforms its English-only counterpart and sets new state-of-the-art results on multilingual benchmarks without relying on translation or architecture changes.... |
Read More |
|
|
![]() |
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE |
Published at 2025-07-29 |
#ML
|
The study presents MixGRPO, a new framework that combines stochastic and ordinary differential equations to enhance the efficiency of GRPO in aligning image generation with human preferences. MixGRPO introduces a sliding window mechanism to reduce optimization overhead and supports higher-order solvers for faster training, resulting in significant improvements in training time and performance compared to existing methods.... |
Read More |
|
|
|
![]() |
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning |
Published at 2025-07-30 |
#ML
|
This study presents RLDP, a new framework that uses deep reinforcement learning to improve the privacy and utility of large language models during training. RLDP dynamically adjusts privacy parameters, reducing training time by up to 71% while maintaining privacy and improving model performance.... |
Read More |
|
|
![]() |
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance |
Published at 2025-07-30 |
#ML
|
Falcon-H1 is a new series of efficient and high-performing large language models that combines Transformer-based attention with State Space Models. It outperforms larger models while using fewer parameters and less data, and is available in multiple configurations and languages under an open-source license.... |
Read More |
|
|
|
![]() |
Repair-R1: Better Test Before Repair |
Published at 2025-07-30 |
#ML
|
The study presents Repair-R1, an improved method for automated program repair that uses test cases during training and before repairing. This approach, implemented with reinforcement learning, enhances defect location, understanding, and repair, outperforming existing methods by increasing repair success rates, test generation success rates, and test coverage.... |
Read More |
|
|
![]() |
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents |
Published at 2025-07-30 |
#ML
|
The study presents a new framework that creates front-end code from UI designs in three clear steps: detecting UI components using a vision-language model, organizing them based on front-end engineering principles, and generating HTML/CSS code. This method is more reliable and understandable than existing black-box methods and can be scaled up to create large amounts of image-code pairs for training and improving open-source vision-language models.... |
Read More |
|
|
|
![]() |
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation |
Published at 2025-07-30 |
#ML
|
The authors present a new dataset, OmniAVS, for referring audio-visual segmentation that includes various multimodal expressions and emphasizes understanding audio content and complex reasoning. They also introduce OISA, a model that uses MLLM to comprehend complex cues and perform reasoning-based segmentation, which outperforms existing methods on the new dataset and other related tasks.... |
Read More |
|
|
![]() |
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning |
Published at 2025-07-30 |
#ML
|
The researchers created a new model called VL-Cogito that uses a special training method called Progressive Curriculum Reinforcement Learning (PCuRL) to improve its reasoning skills for complex multimodal tasks. This training method makes the model practice harder tasks over time and adjusts the difficulty level as needed, making it better at understanding and solving problems in various fields like math, science, and logic.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|