🤗 Daily Paper(2025-08-15)

2 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 15, 2025, 4:06:45 PMAug 15

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Published at 2025-08-13

#ML

The PRELUDE benchmark is designed to test a model's ability to understand and reason through long contexts by checking the consistency of a character's prequel story with the original book's narrative. Experimental results show that state-of-the-art language models struggle with this task, often providing correct answers with incorrect reasoning, and lag behind human performance by a significant margin....

A Survey on Diffusion Language Models

Published at 2025-08-14

#ML

This survey offers a comprehensive overview of Diffusion Language Models (DLMs), their evolution, and their advantages over traditional autoregressive models, such as reduced latency and bidirectional context capture. The paper also discusses current techniques, inference strategies, optimizations, and future research directions for DLMs, while highlighting their applications and limitations in natural language processing tasks....

From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms

Published at 2025-08-14

#ML

The authors propose a new method for evaluating automated interpreting quality that focuses on transparency and explainability. By using only relevant features and a technique called Shapley Value analysis, their approach outperforms traditional methods and provides detailed feedback for learners, enhancing self-regulated learning....

HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs

Published at 2025-08-14

#ML

The authors present HumanSense, a new benchmark to evaluate multimodal large language models' ability to understand complex human intentions and provide empathetic responses. They find that models can be improved by incorporating audio and text information, and by using a multi-stage reinforcement learning approach to enhance reasoning abilities....

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

Published at 2025-08-14

#ML

The authors present NextStep-1, an advanced model for turning text into images, which uses a large network and a smaller partner to predict the next piece of an image from a text description. This method is efficient, outperforms other similar models in creating detailed images, and allows for easy image editing. The authors will also share their work with the public to encourage further research....

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Published at 2025-08-14

#ML

The study examines how using Pass@k as a reward in reinforcement learning enhances exploration ability in large reasoning models. The authors find that exploration and exploitation are not necessarily conflicting and can improve each other, proposing a new method for designing advantage functions in reinforcement learning....

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Published at 2025-08-14

#ML

This study explores how visual encoders like CLIP respond to subtle image acquisition and processing changes, which prior research mostly ignored. The researchers found that these minor adjustments can significantly impact semantic predictions, and their effects depend on the correlation between the changes and semantic labels....

Puppeteer: Rig and Animate Your 3D Models

Published at 2025-08-14

#ML

The authors present a framework called Puppeteer that can automatically create skeletons and skin for 3D models, making it easier to animate them. This tool is better than current methods at accurately creating skeletons and skin for 3D models, and it can be used for various types of 3D content, including those created by AI, producing smoother animations without jittering....

STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

Published at 2025-08-14

#ML

The researchers developed a new method called STream3R for 3D reconstruction that uses a Transformer model to predict point maps, which is more efficient than existing methods that require complex calculations or don't handle long sequences well. STream3R can generalize to various scenarios, including dynamic scenes, and can be trained and fine-tuned using large-scale datasets, making it suitable for real-time 3D perception in streaming environments....

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Published at 2025-08-14

#ML

ToonComposer is a model that combines inbetweening and colorization into one step for cartoon production, reducing manual work and improving flexibility. It uses a sparse sketch injection mechanism and a cartoon adaptation method, outperforming existing methods in visual quality, motion consistency, and production efficiency....

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Published at 2025-08-14

#ML

The researchers developed UI-Venus, a UI agent that uses screenshots and a large language model to perform tasks like UI grounding and navigation, achieving state-of-the-art results by using a new training method and rewards system. They also created a self-evolving framework to improve navigation performance and shared their open-source code and data cleaning protocols to promote further research in the field....

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Published at 2025-08-14

#ML

The authors present We-Math 2.0, a new system that uses a structured math knowledge system, data modeling, and reinforcement learning to improve large language models' ability to reason mathematically. We-Math 2.0 has four main parts: a comprehensive math knowledge system, a flexible and challenging dataset, a two-stage training framework, and a comprehensive benchmark for evaluation....

When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing

Published at 2025-08-14

#ML

This study explores the relationship between privacy and explainability in Natural Language Processing (NLP), focusing on the trade-offs between Differential Privacy and Post-hoc Explainability. The researchers provide practical recommendations for balancing privacy and explainability in NLP, showing that these two aspects can co-exist....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages