🤗 Daily Paper(2025-08-11)

4 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 11, 2025, 4:07:04 PMAug 11

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Published at 2025-07-29

#ML

The research presents UI-AGILE, a comprehensive framework to enhance GUI agents by improving training and inference processes. This involves a new continuous reward function, a 'Simple Thinking' reward, and a resampling strategy for training, as well as a novel method for inference that breaks images into smaller parts to improve accuracy, resulting in state-of-the-art performance on benchmarks ScreenSpot-Pro and ScreenSpot-v2....

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Published at 2025-08-02

#ML

The study introduces a new method called MeshLLM that uses large language models to better understand and generate 3D meshes. It creates a much larger dataset for training and improves the models' ability to capture the structure of 3D meshes, resulting in better performance compared to existing methods....

GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing

Published at 2025-08-04

#ML

This study presents GENIE, a hybrid model that merges the high-quality rendering of NeRF with the editable structure of GS. GENIE uses trainable feature embeddings for Gaussians, enabling real-time, locality-aware editing and dynamic interaction, making it compatible with physical simulation....

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

Published at 2025-08-06

#ML

This survey explores AI assistants, called OS Agents, that use advanced language models to automate tasks on computers and mobile phones. The paper covers their key components, construction methods, evaluation, and future research directions, aiming to guide both academic and industrial development....

Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Published at 2025-08-06

#ML

This study presents a new method called Voost that improves the accuracy and realism of virtual clothing trials by using a single model for both trying on and removing clothes. The method enhances the understanding of how clothes fit on different body types and poses, outperforming other models in tests....

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Published at 2025-08-07

#ML

This survey provides a comprehensive overview of adapting Vision-Language Models (VLMs) without using labeled data, categorizing existing methods into four paradigms based on the availability and nature of unlabeled visual data, and discussing core methodologies, adaptation strategies, and benchmarks for each....

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Published at 2025-08-07

#ML

This study introduces a new framework called Adaptive Exploration Policy Optimization (AEPO) to improve the performance of autonomous agents operating on graphical user interfaces (GUIs) using Multimodal Large Language Models (MLLMs). AEPO addresses the challenge of semantic alignment by encouraging broader exploration and guiding it with a theoretically grounded Adaptive Exploration Reward function, resulting in significant improvements on various GUI grounding benchmarks....

MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs

Published at 2025-08-07

#ML

This study presents MELLA, a multimodal, multilingual dataset designed to enhance the linguistic capabilities and cultural groundedness of multimodal large language models in low-resource languages. By fine-tuning MLLMs on MELLA, the models can produce more detailed and culturally aware descriptions, improving their performance in various low-resource language settings....

Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal

Published at 2025-08-07

#ML

The authors present a new method called ASAP for compressing Chain-of-Thought reasoning in Large Reasoning Models, which improves efficiency and reduces costs in code generation tasks. ASAP first uses anchor-guided pruning to keep the main structure of the reasoning process and then removes unnecessary steps based on a new metric, resulting in faster and cheaper model training and inference without sacrificing accuracy....

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Published at 2025-08-08

#ML

The researchers have developed GLM-4.5, a powerful and efficient open-source language model that can both reason and respond to prompts. By training it extensively and using a unique method, it performs very well in various tasks, using fewer parameters than many similar models. They are sharing two versions of the model, GLM-4.5 and a smaller one, to help improve AI systems that can think and act like humans....

LightSwitch: Multi-view Relighting with Material-guided Diffusion

Published at 2025-08-08

#ML

The authors present a new method called LightSwitch that improves 3D relighting by using multi-view images and material properties, resulting in better and faster relighting of objects compared to previous techniques....

Memp: Exploring Agent Procedural Memory

Published at 2025-08-08

#ML

This study presents a method to improve agents' procedural memory, allowing them to learn, update, and retain information over time. The proposed Memp system converts past agent experiences into detailed instructions and general scripts, which are then dynamically updated. Experiments show that as this memory repository improves, agents perform better and more efficiently on similar tasks, and even transferring memory from a stronger model to a weaker one boosts performance....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages