🤗 Daily Paper(2025-10-06)

1 view

Skip to first unread message

deep.di...@gmail.com

unread,

Oct 6, 2025, 4:07:38 PMOct 6

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features

Published at 2025-09-26

#ML

The study presents a new method called Orthogonal Sparse Autoencoders (OrtSAE) that addresses issues in sparse autoencoders, such as feature absorption and composition, by promoting orthogonality between learned features. This approach leads to more distinct features, improved performance on spurious correlation removal, and comparable performance on other tasks compared to traditional sparse autoencoders....

Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization

Published at 2025-09-27

#ML

This study examines the performance of MXFP4 and NVFP4 formats for large language model inference on NVIDIA and AMD GPUs, identifying challenges in accuracy and proposing a new quantization algorithm, Micro-Rotated-GPTQ, to improve performance and accuracy, achieving significant speedups and matching or surpassing state-of-the-art accuracy....

Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces

Published at 2025-09-27

#ML

This study presents a new method called Policy Reasoning Traces (PRT) to help language models better assess policy compliance, which is crucial for ensuring adherence to human-defined rules. The researchers found that using PRTs significantly improves the performance of various language models in identifying violations of policies like HIPAA and GDPR, and also enhances their ability to accurately cite policy clauses....

DiffTester: Accelerating Unit Test Generation for Diffusion LLMs via Repetitive Pattern

Published at 2025-09-29

#ML

The study presents DiffTester, a framework that enhances the efficiency of unit test generation for diffusion LLMs by identifying and utilizing repetitive patterns in unit tests, thereby increasing the number of tokens produced without sacrificing quality. Extensive experiments demonstrate its effectiveness across different models, programming languages, and its practicality in software development....

Pretraining with hierarchical memories: separating long-tail and common knowledge

Published at 2025-09-29

#ML

The authors propose a new method for language modeling that uses a small model combined with a large, hierarchical memory bank. This approach allows the model to access vast amounts of world knowledge without requiring a large number of parameters, making it more efficient and practical for use on devices with limited resources. Experiments show that this method can achieve performance comparable to larger models, and the authors provide insights into the optimal design of these memory banks for...

Triangle Splatting+: Differentiable Rendering with Opaque Triangles

Published at 2025-09-29

#ML

This study presents an improved method for reconstructing 3D scenes and generating new views, called Triangle Splatting+. It directly optimizes triangles, which are basic building blocks of computer graphics, within a differentiable framework, leading to high-quality visuals and faster training compared to previous methods, and enabling further applications like physics simulations and interactive explorations....

Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

Published at 2025-09-30

#ML

This study presents a new framework called Text Preference Optimization (TPO) that aligns text-to-image diffusion models accurately without the need for expensive, high-quality human annotations. TPO outperforms existing methods in both quantitative and qualitative evaluations, offering a more scalable and cost-effective solution for text-to-image alignment....

Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

Published at 2025-09-30

#ML

The authors present a new framework called the Game-Time Benchmark to evaluate the temporal skills of spoken language models in conversations, such as timing, pace, and simultaneous speaking. They find that while advanced models can handle simple tasks, most struggle with following instructions and interacting in real-time, highlighting the need for more research in this area....

NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving

Published at 2025-09-30

#ML

This study presents NuRisk, a new dataset for assessing risks in self-driving cars by incorporating real-world data and safety-critical scenarios. The dataset enables better understanding of how risks change over time, and the authors improve upon existing models by fine-tuning a 7B VLM agent, which shows significant advancements in spatio-temporal reasoning for autonomous driving....

Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It

Published at 2025-09-30

#ML

The study presents PREFDISCO, a method to evaluate large language models' ability to adapt to individual users by transforming static benchmarks into interactive personalization tasks using personas with limited preferences. Results show that most models struggle with personalized reasoning, indicating a need for dedicated development in this area....

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

Published at 2025-09-30

#ML

The study explores the risks of self-evolving AI agents, which can inadvertently become harmful due to unintended changes in their models, memory, tools, and workflows. The research finds that even advanced AI models can experience these risks, and suggests new safety measures are needed to prevent this 'misevolution'....

A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

Published at 2025-10-01

#ML

The study focuses on improving the training of large language models as agents through multi-turn reinforcement learning. By analyzing the impacts of task complexity, reward sparsity, and policy gradient methods, the authors provide a practical guide for training LLM agents in various textual domains, including TextWorld, ALFWorld, and SWE-Gym....

Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents

Published at 2025-10-01

#ML

This research proposes a new loss function, MFD, to reduce oscillations in Consistency Models' training, leading to faster training times and better sample quality, even with small batch sizes. The method, named Align Your Tangent (AYT), aligns the model's updates with the data manifold, outperforming LPIPS and enabling efficient training....

Apriel-1.5-15b-Thinker

Published at 2025-10-01

#ML

The authors propose a 15-billion parameter model called Apriel-1.5-15B-Thinker, which performs at the top level by using a smart training approach rather than just increasing the size. This model achieves strong results on various tasks and benchmarks, even with fewer resources, by using a three-step training process and high-quality data, making advanced multimodal reasoning accessible to more organizations....

Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition

Published at 2025-10-01

#ML

This study presents a new method called General Policy Composition (GPC) that enhances robotic policies without additional training by combining multiple pre-trained policies. The method improves performance and adaptability in various tasks, as shown in experiments on different benchmarks and real-world robotic evaluations....

Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling

Published at 2025-10-01

#ML

The study presents a new framework called Continuously Augmented Discrete Diffusion (CADD) that improves upon traditional discrete diffusion models by introducing a continuous latent space. This enhancement allows for more informative and less corrupted masked tokens, leading to better generative quality in text, image, and code modeling....

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Published at 2025-10-01

#ML

This study presents a new method to create efficient multi-modal large language models by addressing the challenge of visual tokens consuming too much computational power. The proposed framework, EPIC, reduces training difficulty by introducing token and layer consistency distillation, which helps the model adapt to perturbations in the feature space caused by token compression, resulting in superior effectiveness, robustness, and generalization capabilities....

LSPO: Length-aware Dynamic Sampling for Policy Optimization in LLM Reasoning

Published at 2025-10-01

#ML

This research presents LSPO, a new method for optimizing policy training in large language models (LLMs) by dynamically selecting training data based on response length. The study shows that LSPO improves learning effectiveness and provides insights for future research by conducting an ablation study on incorporating length signals into dynamic sampling....

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Published at 2025-10-01

#ML

The study presents a method called RECAP to enhance the safety and reasoning abilities of large reasoning models. This technique trains models to identify and correct flawed reasoning, making them more reliable and resistant to biases, without requiring additional resources or altering the existing reinforcement learning setup....

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

Published at 2025-10-01

#ML

The study presents the first comprehensive benchmark on detecting prompt injection attacks targeting web agents. It categorizes these attacks, constructs datasets for both malicious and benign samples, systematizes text-based and image-based detection methods, and evaluates their performance. Key findings reveal that while some detectors work moderately to high accuracy against explicit textual instructions or visible image perturbations, they struggle with attacks that omit explicit instruction...

How Confident are Video Models? Empowering Video Models to Express their Uncertainty

Published at 2025-10-02

#ML

The study presents a framework for measuring the uncertainty of video models, which is crucial for preventing the generation of incorrect or misleading videos. The framework includes a metric for evaluating the reliability of video models, a method for quantifying their uncertainty, and a dataset for testing their performance....

Less LLM, More Documents: Searching for Improved RAG

Published at 2025-10-02

#ML

This study shows that expanding the collection of documents used in Retrieval-Augmented Generation (RAG) can make the process more effective and cost-efficient, often providing similar results to using larger language models. The key benefit comes from being able to find more relevant information, although the efficiency of using that information remains the same....

REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration

Published at 2025-10-02

#ML

The researchers present a new method called REPAIR that allows for precise and cost-effective updates to large language models without losing important information or causing unintended side effects. REPAIR improves editing accuracy and reduces knowledge forgetting, making it a robust framework for creating reliable and scalable language models....

Self-Improvement in Multimodal Large Language Models: A Survey

Published at 2025-10-02

#ML

This survey explores how Large Language Models can improve themselves without much extra cost, focusing on models that use multiple types of data like text and images. The review covers methods for collecting and organizing data, optimizing models, and evaluates their performance, while also highlighting future research areas....

SoundReactor: Frame-level Online Video-to-Audio Generation

Published at 2025-10-02

#ML

The study presents SoundReactor, a novel framework that generates audio from video in real-time, without waiting for the entire video sequence. This tool is particularly useful for live content creation and generative world models, offering low latency and high-quality audio-visual synchronization....

TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling

Published at 2025-10-02

#ML

The authors present a unified music recommendation system powered by large language models that integrates various retrieval methods, such as boolean filters, sparse and dense retrieval, and generative retrieval, to provide personalized recommendations based on user intent. This system improves upon existing generative recommenders by utilizing underutilized components like metadata or attribute filtering, resulting in competitive performance across different recommendation scenarios....

CoDA: Agentic Systems for Collaborative Data Visualization

Published at 2025-10-03

#ML

The authors present a new approach to automating data visualization using a collaborative multi-agent system called CoDA, which includes specialized LLM agents for various tasks such as metadata analysis and code generation. The proposed system outperforms existing methods by a significant margin, indicating the potential of using integrated, collaborative agentic workflows for data visualization automation....

Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models

Published at 2025-10-03

#ML

This study presents MaskGRPO, a new method that effectively uses rewards to optimize multimodal reinforcement learning in discrete diffusion models. MaskGRPO improves the process by enabling effective importance sampling and adapting to different modalities, resulting in more stable and efficient updates, stronger reasoning performance, and better generation quality....

Dale meets Langevin: A Multiplicative Denoising Diffusion Model

Published at 2025-10-03

#ML

This study explores a biologically inspired learning technique based on Dale's law, which leads to log-normally distributed synaptic weights. The researchers propose a new multiplicative denoising score-matching formalism, resulting in a novel update scheme for generating images from a log-normal density, and demonstrate its generative capabilities on various datasets....

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Published at 2025-10-03

#ML

The study presents FocusAgent, a method that uses a lightweight LLM to extract relevant information from web pages, reducing their size by over 50% while maintaining performance and improving security against injection attacks....

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

Published at 2025-10-03

#ML

This study proposes two new methods to enhance the accuracy of mapping natural-language instructions to pixel coordinates on GUIs, particularly for high-resolution displays. The first method, RULER tokens, acts as explicit coordinate markers, while the second, Interleaved MRoPE, ensures equal representation of width and height dimensions in spatial encoding, resulting in improved GUI automation across various resolutions and platforms....

LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models

Published at 2025-10-03

#ML

The paper presents a new method called LEAML that helps advanced language models understand visual tasks in specific areas like medical imaging, where there's little labeled data. LEAML uses both limited labeled and abundant unlabeled images, creating fake questions and answers for the unlabeled data and updating only the relevant parts of the model, which improves performance in tasks like gastrointestinal endoscopy and sports VQA with minimal supervision....

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus

Published at 2025-10-03

#ML

The SpineMed ecosystem, co-developed with spine surgeons, presents a large-scale dataset and evaluation framework for improving AI-assisted diagnosis of spine disorders. The dataset, SpineMed-450k, is designed for vertebral-level reasoning across X-ray, CT, and MRI, while SpineBench assesses models on clinically important factors. Advanced vision-language models show weaknesses in this fine-grained reasoning, but a model trained on SpineMed-450k significantly outperforms them, according to clini...

SurveyBench: How Well Can LLM(-Agents) Write Academic Surveys?

Published at 2025-10-03

#ML

The study presents SurveyBench, a detailed and quiz-driven evaluation framework designed to assess the quality of automated academic surveys generated by LLM-based methods. SurveyBench uses real academic topics, high-quality surveys, and a variety of metrics to evaluate the generated surveys, revealing a significant gap between the generated surveys and human-written ones....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages