🤗 Daily Paper(2025-09-04)

0 views

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 4, 2025, 4:06:43 PMSep 4

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Published at 2025-08-30

#ML

The authors present a new framework called Face-MoGLE that improves controllable face generation by using a mixture of global and local experts in a diffusion transformer model. This approach allows for better manipulation of facial attributes and generates more realistic faces, with potential applications in generative modeling and security....

Open Data Synthesis For Deep Research

Published at 2025-08-30

#ML

The study presents InfoSeek, a framework that generates complex Deep Research tasks by constructing a Research Tree from webpages, which is then transformed into natural language questions. InfoSeek enables the creation of over 50K training examples and helps 3B LLMs outperform larger models and commercial APIs on a challenging benchmark....

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

Published at 2025-08-31

#ML

Robix is a model that combines robot reasoning, task planning, and natural language interaction. It allows robots to understand complex instructions, plan long-term tasks, and communicate with humans naturally, while also handling interruptions and using common sense during tasks....

SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Published at 2025-08-31

#ML

The authors present SATQuest, a tool that evaluates and improves logical reasoning in large language models by generating diverse problems based on Conjunctive Normal Form instances. SATQuest helps identify limitations in current models and shows that reinforcement fine-tuning with SATQuest can significantly enhance performance and generalization in logical reasoning tasks....

MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

Published at 2025-09-02

#ML

This study presents a new framework called MOSAIC that improves the process of creating images based on multiple references by focusing on precise alignment of specific parts of each reference image to the generated image, and by separating the features of different references to avoid interference. The proposed method outperforms existing techniques, especially when dealing with more than three references, enabling more complex multi-subject image synthesis....

Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots

Published at 2025-09-02

#ML

Researchers developed Camera Depth Models (CDMs) to enhance depth perception in robots using depth cameras, overcoming challenges like limited accuracy and noise. The resulting CDMs provide nearly simulation-level accuracy in depth prediction, enabling robots to generalize seamlessly from simulation to real-world tasks with minimal performance degradation....

Planning with Reasoning using Vision Language World Model

Published at 2025-09-02

#ML

This research presents a new model called Vision Language World Model (VLWM) that uses language and visual data to understand and plan actions in the world. The VLWM can infer goals, predict actions, and learn from past experiences to improve its planning abilities, outperforming other models in various benchmarks and human evaluations....

LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Published at 2025-09-03

#ML

The study presents LMEnt, a suite for analyzing knowledge acquisition in language models during pretraining, which includes a knowledge-rich pretraining corpus, an improved entity-based retrieval method, and pretrained models for studying connections between entity mentions in pretraining and downstream performance....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages