🤗 Daily Paper(2025-09-26)

0 views

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 26, 2025, 4:07:21 PMSep 26

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

Published at 2025-09-18

#ML

This study applies a human cognitive framework to analyze the reasoning of Large Reasoning Models (LRMs) in solving math problems. The researchers created a publicly available benchmark and annotated corpus, revealing patterns in LRM reasoning and providing a methodology for interpreting LRM cognition....

Blueprints of Trust: AI System Cards for End to End Transparency and Governance

Published at 2025-09-23

#ML

A new framework called Hazard-Aware System Card (HASC) is proposed to improve transparency and accountability in AI systems by integrating security and safety records, using standardized identifiers, and enabling informed decision-making throughout the AI system's lifecycle. The HASC is also compared to the ISO/IEC 42001:2023 standard, demonstrating how they can work together for better AI system governance....

Residual Off-Policy RL for Finetuning Behavior Cloning Policies

Published at 2025-09-23

#ML

The authors propose a method that combines behavior cloning and reinforcement learning to improve visuomotor control policies for high-degree-of-freedom systems. By using behavior cloning policies as a base and adding lightweight corrections through off-policy RL, the approach requires less data and can work with sparse rewards, leading to state-of-the-art performance in various tasks, including the first successful real-world RL training on a humanoid robot with dexterous hands....

Thinking While Listening: Simple Test Time Scaling For Audio Classification

Published at 2025-09-23

#ML

The authors present a new method to improve audio classification by allowing neural models to 'think' while listening, similar to how large language models reason. They propose two approaches: incorporating thinking into existing pipelines and designing a new architecture from scratch, both of which enhance classification accuracy. They test this with two open-source reasoning models and find that a simpler method, retraining only the embedding matrix of a smaller model, can outperform larger te...

CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

Published at 2025-09-24

#ML

The study presents CE-GPPO, a new algorithm that manages policy entropy in reinforcement learning by reintroducing valuable gradients from clipped tokens in PPO, leading to better exploration-exploitation balance and improved performance on mathematical reasoning benchmarks....

Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving

Published at 2025-09-24

#ML

This study presents ReflectDrive, a new framework for autonomous driving that uses a reflection mechanism and discrete diffusion to generate safe trajectories. By discretizing the driving space and utilizing pre-trained models, ReflectDrive can plan and correct paths without expensive gradient calculations, improving safety and efficiency in complex real-world environments....

MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model

Published at 2025-09-24

#ML

The authors present a method called MI-Fuse that improves speech emotion recognition in new environments without using source data. This is achieved by combining predictions from a large, unavailable audio-language model and a smaller, trained model, resulting in better performance than both models individually....

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Published at 2025-09-24

#ML

SceneWeaver is a new framework for creating realistic 3D environments that can adapt to complex user instructions and improve over time through a process of planning, action, and reflection. It uses a variety of tools, guided by physical plausibility, visual realism, and semantic alignment with user input, to create detailed and consistent scenes, outperforming prior methods and demonstrating generalization to diverse instructions....

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Published at 2025-09-24

#ML

Seedream 4.0 is a powerful and efficient system for creating and editing images in high resolution, using a unified approach that combines text and images. It's capable of generating detailed and realistic images quickly, and can be used for creative and professional applications, extending the capabilities of traditional image generation systems....

Thinking Augmented Pre-training

Published at 2025-09-24

#ML

The authors present a method called Thinking augmented Pre-Training (TPT) that enhances the learning of large language models by adding automatically generated reasoning steps to existing text data, improving data efficiency and model performance by a factor of 3, especially for challenging reasoning tasks....

V-GameGym: Visual Game Generation for Code Large Language Models

Published at 2025-09-24

#ML

This study presents V-GameGym, a comprehensive benchmark for evaluating code large language models in visual game development, addressing the gap between current LLM capabilities and practical game development requirements by focusing on playability, visual aesthetics, and user engagement....

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Published at 2025-09-24

#ML

The study presents VCRL, a new framework for training large language models on mathematical reasoning tasks. VCRL improves upon existing methods by dynamically adjusting the difficulty of training samples based on their reward variance, making the learning process more aligned with human cognitive abilities....

When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

Published at 2025-09-24

#ML

The study examines the flaws in language model benchmarking using human judgment, finding that these benchmarks often produce misleading results due to design failures. The researchers propose two methods to identify these issues and apply them to a popular benchmark, revealing significant problems and offering guidelines for creating more reliable benchmarks....

AutoIntent: AutoML for Text Classification

Published at 2025-09-25

#ML

AutoIntent is a user-friendly tool that automates text classification tasks, making it easier for users by handling embedding model selection, classifier optimization, and decision threshold tuning. It outperforms other AutoML tools in intent classification and allows users to manage the balance between effectiveness and resource usage....

BESPOKE: Benchmark for Search-Augmented Large Language Model Personalization via Diagnostic Feedback

Published at 2025-09-25

#ML

The researchers have created a new benchmark called BESPOKE to evaluate and improve personalization in search-augmented large language models. This benchmark uses real user data and detailed feedback to assess how well these models can tailor information to individual user needs and preferences, helping to identify key areas for improvement in personalized information-seeking tasks....

Behind RoPE: How Does Causal Mask Encode Positional Information?

Published at 2025-09-25

#ML

The study reveals that the causal mask in Transformer decoders, in addition to RoPE, impacts attention scores and introduces positional information. The research shows that the causal mask favors nearby query-key pairs, which is similar to common positional encodings, and this effect is also present in modern large language models....

CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling

Published at 2025-09-25

#ML

The authors propose CHARM, a new method for creating anime hairstyles that uses a compact, invertible control-point-based parameterization, making it efficient for both design and learning. CHARM's autoregressive generative framework interprets anime hairstyles as a sequential 'hair language', resulting in high-fidelity anime hairstyle creation, which is supported by extensive experiments and a large-scale dataset of 37K high-quality anime hairstyles....

Does FLUX Already Know How to Perform Physically Plausible Image Composition?

Published at 2025-09-25

#ML

This study presents SHINE, a new framework for creating high-quality, realistic images by inserting objects into scenes without the need for complex training. SHINE uses pre-existing models and techniques to ensure the inserted objects maintain their accuracy and blend seamlessly with the background, while also addressing issues like poor lighting and reflections. The researchers also introduce a new benchmark, ComplexCompo, to evaluate and compare the performance of different image composition ...

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

Published at 2025-09-25

#ML

Hunyuan3D-Omni is a new framework that allows for precise control over 3D asset creation, taking in various conditioning signals like point clouds, voxels, and skeletal pose priors, and unifying them in a single architecture for improved accuracy and robustness in production workflows....

Interactive Recommendation Agent with Active User Commands

Published at 2025-09-25

#ML

This study presents a new approach, Interactive Recommendation Feed (IRF), that allows users to provide feedback through natural language commands, enabling more precise and nuanced control over recommendation policies compared to traditional systems. The proposed RecBot system, powered by IRF, significantly enhances user satisfaction and business outcomes through extensive experiments....

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Published at 2025-09-25

#ML

This research introduces a new data selection strategy called Variance-Aware Sampling to improve multimodal reasoning models, addressing the issue of unstable reinforcement learning algorithms. The study also provides a large, high-quality dataset for training these models and releases open-source versions of the models themselves, demonstrating their effectiveness through experiments and analysis....

MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning

Published at 2025-09-25

#ML

The authors present a new framework called MOSS-ChatV that uses reinforcement learning with a dynamic reward system to improve the understanding of temporal dynamics in videos by multimodal large language models. This framework enhances the interpretability and robustness of these models by aligning their reasoning process with the video content, and it has shown significant improvements in various benchmarks and different architectures....

Quantized Visual Geometry Grounded Transformer

Published at 2025-09-25

#ML

The authors address the challenges of compressing large-scale Visual Geometry Grounded Transformers (VGGTs) using Post-Training Quantization (PTQ), which faces issues like heavy-tailed activation distributions and unstable calibration sample selection. They introduce QuantVGGT, a quantization framework with two main contributions: Dual-Smoothed Fine-Grained Quantization to mitigate heavy-tailed distributions and inter-channel variance, and Noise-Filtered Diverse Sampling to ensure stable quantiz...

Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution

Published at 2025-09-25

#ML

The authors present Recon-Act, a self-improving system for web browsing that uses two teams of agents: one for analyzing and generating tools, and another for executing tasks. This system learns from its mistakes and improves its performance on long-term tasks and unfamiliar websites, setting a new standard for web browsing agents....

SD3.5-Flash: Distribution-Guided Distillation of Generative Flows

Published at 2025-09-25

#ML

The authors have developed an efficient framework called SD3.5-Flash that allows high-quality image generation on consumer devices by simplifying complex models. They achieved this through two main innovations, timestep sharing and split-timestep fine-tuning, along with other optimizations, enabling fast and memory-efficient generation on various devices, from mobile phones to desktop computers....

ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning

Published at 2025-09-25

#ML

The authors present ScaleDiff, a method to generate complex math problems efficiently by using an adaptive thinking model and a specialized generator, which significantly improves problem-solving accuracy for large reasoning models, even with a cost-efficient teacher model....

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

Published at 2025-09-25

#ML

The authors have developed a model that connects natural language with various scientific data types, which can perform tasks such as translation, extraction, prediction, and generation across different scientific workflows. This model, which is open-source, has been trained to improve cross-domain generalization and reliability, making it more versatile and accurate than specialist systems....

StyleBench: Evaluating thinking styles in Large Language Models

Published at 2025-09-25

#ML

The study presents StyleBench, a tool that evaluates different reasoning styles in large language models across various tasks and models. The research finds that the best reasoning style depends on the task and model size, with larger models better suited for complex tasks and smaller models more efficient for simpler tasks....

The Unanticipated Asymmetry Between Perceptual Optimization and Assessment

Published at 2025-09-25

#ML

This study explores the relationship between perceptual optimization and image quality assessment, revealing a surprising imbalance. It finds that the best metrics for assessing image quality may not be the most effective for optimizing perceptual quality, and that the design of discriminators can significantly impact optimization results....

Tree Search for LLM Agent Reinforcement Learning

Published at 2025-09-25

#ML

The study presents Tree-GRPO, a new method for training language models in complex tasks. This method uses a tree-like structure to increase efficiency and create detailed training signals, which outperforms traditional methods in various experiments....

TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them

Published at 2025-09-25

#ML

The study identifies and addresses issues in using Large Language Models as automated evaluators, proposing TrustJudge, a probabilistic framework that reduces inconsistencies and improves evaluation accuracy without additional training or human annotations....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages