🤗 Daily Paper(2025-10-24)

2 views

Skip to first unread message

deep.di...@gmail.com

unread,

Oct 24, 2025, 4:07:32 PMOct 24

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation

Published at 2025-10-14

#ML

This study presents a new method called CiteGuard to improve the accuracy of citations generated by large language models (LLMs) in scientific writing, making them more reliable and comparable to human-generated citations. CiteGuard enhances the previous standard by 12.3% and reaches up to 65.4% accuracy, which is nearly as good as human performance....

Diff-XYZ: A Benchmark for Evaluating Diff Understanding

Published at 2025-10-14

#ML

The study presents Diff-XYZ, a benchmark for evaluating code-diff understanding, which includes tasks like applying, anti-applying, and generating diffs. The research compares different diff formats and their suitability for various use cases and model sizes, providing a foundation for future improvements in diff handling and code editing models....

Emergence of Linear Truth Encodings in Language Models

Published at 2025-10-17

#ML

This study explains how large language models can distinguish true from false statements using linear subspaces, and introduces a simple toy model that demonstrates this process. The researchers found that models learn to separate truth from falsehood by associating factual statements with each other, and this ability improves over time as the model lowers its language-modeling loss....

Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations

Published at 2025-10-19

#ML

This study examines how speaker emotions affect the safety of large audio-language models, revealing significant inconsistencies and calling for improved alignment strategies to ensure these models perform safely under emotional variations....

Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism

Published at 2025-10-19

#ML

This study presents a unified benchmark to evaluate and compare attention mechanisms for long-context training in large language models. The benchmark assesses methods based on attention mask patterns and sequence length, providing insights into efficiency, scalability, and performance for practical guidance in designing and deploying attention mechanisms....

SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models

Published at 2025-10-19

#ML

The study presents SAKE, a new benchmark for editing auditory attributes in Large Audio-Language Models, focusing on challenges like preserving related knowledge, generalizing edits, and maintaining updates in various scenarios....

Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs

Published at 2025-10-20

#ML

This study explores how factors like hidden size, mlp-to-attention ratio, and grouped-query attention affect the efficiency and accuracy of large language models (LLMs). They developed a conditional scaling law to predict the best architectural choices, resulting in models that are more accurate and efficient compared to existing open-source baselines....

Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference

Published at 2025-10-21

#ML

The authors present a new method called Adamas to improve the efficiency of long-context inference in large language models. Adamas uses a Hadamard transform and other techniques to create compact representations and efficiently select important information, resulting in faster processing and comparable or better accuracy compared to traditional methods....

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Published at 2025-10-21

#ML

The study presents a new method called Search Self-play (SSP) that improves the capability of learning LLM agents in reinforcement learning with verifiable rewards (RLVR) without requiring massive human efforts. SSP allows agents to act as both task proposers and problem solvers, enabling them to generate search queries, find accurate answers, and improve their performance through competition and cooperation, as demonstrated by experimental results on various benchmarks....

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Published at 2025-10-22

#ML

AdaSPEC is a new method that improves the performance of Speculative Decoding, a technique that speeds up large language model inference. It does this by selectively filtering out difficult-to-fit tokens during the Knowledge Distillation process, resulting in a draft model that better aligns with the target model and improves the overall token acceptance rate without sacrificing quality....

Communication to Completion: Modeling Collaborative Workflows with Intelligent Multi-Agent Communication

Published at 2025-10-22

#ML

This study presents a new framework called Communication to Completion (C2C) that enhances teamwork in complex tasks by improving communication strategies in multi-agent systems. The framework uses a novel metric called Alignment Factor to measure agent task alignment and a Sequential Action Framework to make cost-aware communication choices, resulting in a 40% reduction in task completion time without sacrificing effectiveness....

Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1

Published at 2025-10-22

#ML

This study presents AutoPage, a multi-agent system that simplifies the process of creating project webpages from research papers. By breaking down the task into smaller steps and incorporating human feedback, AutoPage generates high-quality, visually appealing webpages efficiently and affordably....

Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Published at 2025-10-22

#ML

The authors propose a new method called Loopholing that helps preserve information in discrete diffusion models, improving their performance and reducing generative perplexity by up to 61% compared to previous models. This leads to better text coherence and improved performance on reasoning tasks like arithmetic benchmarks, without relying on autoregressive models....

MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration

Published at 2025-10-22

#ML

The authors present MSC-Bench, a comprehensive benchmark for testing multi-server tool orchestration by AI agents, which evaluates various challenges like functional overlap and cross-server orchestration, and offers a five-level curriculum to test agent capabilities. Experiments using MSC-Bench reveal limitations in existing agents' robustness and performance, providing a framework for improving tool-using agents....

Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets

Published at 2025-10-22

#ML

The authors present Seed3D 1.0, a model that generates high-quality, physics-ready 3D assets from images, addressing the challenge of creating diverse and realistic environments for AI training. This model enables the generation of individual objects and entire scenes, which can be directly used in physics engines for robotic manipulation and simulation training....

The Massive Legal Embedding Benchmark (MLEB)

Published at 2025-10-22

#ML

The authors created the Massive Legal Embedding Benchmark (MLEB), a large and diverse open-source benchmark for legal information retrieval. MLEB includes ten expert-annotated datasets from various jurisdictions and document types, seven of which were newly constructed, to help improve legal information retrieval systems....

ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

Published at 2025-10-23

#ML

The study presents a new method called ARGenSeg that improves image segmentation by using a unified framework for multimodal understanding and pixel-level perception. This approach generates detailed masks for objects and relies on a language model's ability to understand images, resulting in faster and more accurate segmentation compared to existing methods....

AlphaFlow: Understanding and Improving MeanFlow Models

Published at 2025-10-23

#ML

The study examines the MeanFlow model, a framework for few-step generative modeling, and identifies two conflicting objectives within it. They propose alpha-Flow, a new objective function that resolves these conflicts and improves convergence, achieving state-of-the-art results on ImageNet-1K 256x256 with vanilla DiT backbones....

ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature

Published at 2025-10-23

#ML

ComProScanner is a user-friendly, automated platform that helps extract, validate, and visualize structured data from scientific literature, specifically focusing on chemical compositions and properties. It was tested on 100 journal articles and found that the DeepSeek-V3-0324 language model had the best performance in extracting complex data related to ceramic piezoelectric materials....

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

Published at 2025-10-23

#ML

The authors propose Conan, a framework that uses visual evidence to reason through multi-step video data, outperforming a baseline model by over 10% in accuracy on various benchmarks, and demonstrating strong scalability and robustness in long-video understanding tasks....

DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Published at 2025-10-23

#ML

Researchers developed a method called DyPE that allows image generation models to create ultra-high-resolution images without extra training or cost. This method improves image quality and performance on various benchmarks, especially at higher resolutions, by dynamically adjusting the model's positional encoding during the image generation process....

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Published at 2025-10-23

#ML

The authors present a new method called RLEV that incorporates human-defined value signals into the reward function of LLMs, improving their performance on tasks with different levels of importance. This method outperforms traditional correctness-only baselines and learns to prioritize high-value prompts while being robust to noisy value signals....

From Masks to Worlds: A Hitchhiker's Guide to World Models

Published at 2025-10-23

#ML

This guide focuses on the development of world models, starting from masked models that unify representation learning to memory-augmented systems that maintain consistent worlds over time, emphasizing the generative heart, interactive loop, and memory system....

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

Published at 2025-10-23

#ML

The research presents HoloCine, a model that generates coherent, multi-shot video narratives by ensuring global consistency across scenes, providing precise directorial control and efficient minute-scale generation. HoloCine demonstrates new abilities like character memory and understanding of cinematic techniques, marking a shift towards automated filmmaking....

ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

Published at 2025-10-23

#ML

ImpossibleBench is a framework that measures large language models' tendency to exploit test cases, by creating 'impossible' tasks with conflicts between natural language instructions and unit tests. It helps study model behaviors, context engineering, and develop monitoring tools, aiming to build more robust LLM systems....

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

Published at 2025-10-23

#ML

The study presents LayerComposer, a new interactive framework for personalized, multi-subject text-to-image generation, which offers better spatial control and identity preservation than existing methods. It uses a layered canvas for each subject and a locking mechanism to enable flexible adapting to the surrounding context while preserving selected layers with high fidelity....

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Published at 2025-10-23

#ML

The study presents a new framework, Open-o3 Video, which highlights key timestamps, objects, and locations in videos to improve reasoning accuracy. They created two new datasets with detailed temporal and spatial annotations and used a specialized training method to achieve state-of-the-art performance on various video understanding benchmarks....

Thought Communication in Multiagent Collaboration

Published at 2025-10-23

#ML

The authors propose a new approach called 'thought communication' that allows agents to interact directly with each other, similar to telepathy, without relying on natural language. They develop a framework to extract and share hidden thoughts among agents, which can be applied to all types of data, and demonstrate its effectiveness through experiments....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages