🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation |
Published at 2025-08-30 |
#ML
|
The authors present a new framework called Face-MoGLE that improves controllable face generation by using a mixture of global and local experts in a diffusion transformer model. This approach allows for better manipulation of facial attributes and generates more realistic faces, with potential applications in generative modeling and security.... |
Read More |
|
|
![]() |
Open Data Synthesis For Deep Research |
Published at 2025-08-30 |
#ML
|
The study presents InfoSeek, a framework that generates complex Deep Research tasks by constructing a Research Tree from webpages, which is then transformed into natural language questions. InfoSeek enables the creation of over 50K training examples and helps 3B LLMs outperform larger models and commercial APIs on a challenging benchmark.... |
Read More |
|
|
|
![]() |
Robix: A Unified Model for Robot Interaction, Reasoning and Planning |
Published at 2025-08-31 |
#ML
|
Robix is a model that combines robot reasoning, task planning, and natural language interaction. It allows robots to understand complex instructions, plan long-term tasks, and communicate with humans naturally, while also handling interruptions and using common sense during tasks.... |
Read More |
|
|
![]() |
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs |
Published at 2025-08-31 |
#ML
|
The authors present SATQuest, a tool that evaluates and improves logical reasoning in large language models by generating diverse problems based on Conjunctive Normal Form instances. SATQuest helps identify limitations in current models and shows that reinforcement fine-tuning with SATQuest can significantly enhance performance and generalization in logical reasoning tasks.... |
Read More |
|
|
|
![]() |
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement |
Published at 2025-09-02 |
#ML
|
This study presents a new framework called MOSAIC that improves the process of creating images based on multiple references by focusing on precise alignment of specific parts of each reference image to the generated image, and by separating the features of different references to avoid interference. The proposed method outperforms existing techniques, especially when dealing with more than three references, enabling more complex multi-subject image synthesis.... |
Read More |
|
|
![]() |
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots |
Published at 2025-09-02 |
#ML
|
Researchers developed Camera Depth Models (CDMs) to enhance depth perception in robots using depth cameras, overcoming challenges like limited accuracy and noise. The resulting CDMs provide nearly simulation-level accuracy in depth prediction, enabling robots to generalize seamlessly from simulation to real-world tasks with minimal performance degradation.... |
Read More |
|
|
|
![]() |
Planning with Reasoning using Vision Language World Model |
Published at 2025-09-02 |
#ML
|
This research presents a new model called Vision Language World Model (VLWM) that uses language and visual data to understand and plan actions in the world. The VLWM can infer goals, predict actions, and learn from past experiences to improve its planning abilities, outperforming other models in various benchmarks and human evaluations.... |
Read More |
|
|
![]() |
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations |
Published at 2025-09-03 |
#ML
|
The study presents LMEnt, a suite for analyzing knowledge acquisition in language models during pretraining, which includes a knowledge-rich pretraining corpus, an improved entity-based retrieval method, and pretrained models for studying connections between entity mentions in pretraining and downstream performance.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|