🤗 Daily Paper(2025-09-10)

1 view

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 10, 2025, 4:07:18 PMSep 10

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling

Published at 2025-09-01

#ML

The authors present a new method called Q-Sched that reduces the size of few-step diffusion models by 4 times while maintaining their quality. This is achieved by adjusting the sampling trajectory of the models and introducing a new loss function called JAQ, which optimizes the models without requiring full-precision inference. The result is a more efficient and effective way to generate high-quality images using fewer computational resources....

Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

Published at 2025-09-03

#ML

The study finds that reinforcement learning improves complex reasoning in language models by developing a hierarchy of skills, first focusing on low-level procedures and then on high-level strategies. The researchers propose a new algorithm, HICA, which focuses on high-impact planning tokens, outperforming existing algorithms and providing a better measure of strategic exploration....

Benchmarking Information Retrieval Models on Complex Retrieval Tasks

Published at 2025-09-08

#ML

The study creates a diverse and realistic set of complex retrieval tasks to assess the capabilities of state-of-the-art retrieval models, which struggle to provide high-quality results even for the best models. The research also examines the effect of using large language models (LLMs) for query expansion and rewriting, finding that while LLMs can help weaker models, the strongest model sees reduced performance with all rewriting techniques....

Causal Attention with Lookahead Keys

Published at 2025-09-08

#ML

The authors propose a new attention mechanism called CASTLE that updates token keys as the context unfolds, allowing them to incorporate later context while maintaining the autoregressive property. This method improves language modeling performance and enables efficient parallel training without explicit materialization of lookahead keys....

Curia: A Multi-Modal Foundation Model for Radiology

Published at 2025-09-08

#ML

Researchers developed a comprehensive AI model, Curia, trained on a large dataset of hospital imaging exams, which outperforms existing models in various radiological tasks and demonstrates promising capabilities in low-data settings and across different imaging modalities....

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Published at 2025-09-08

#ML

The study presents Direct-Align, a method that predefines a noise prior to recover original images from any time steps via interpolation, avoiding over-optimization in late timesteps. Additionally, Semantic Relative Preference Optimization (SRPO) is introduced, enabling online adjustment of rewards in response to positive and negative prompt augmentation, which reduces the need for offline reward fine-tuning....

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Published at 2025-09-08

#ML

The authors present F1, a pretrained Vision-Language-Action framework that incorporates visual foresight generation into its decision-making process. F1 utilizes a Mixture-of-Transformer architecture and a next-scale prediction mechanism to forecast future visual states, allowing it to generate actions that implicitly achieve visual goals. The model is trained on a large dataset of diverse tasks, resulting in improved performance and generalization ability compared to existing approaches....

From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers

Published at 2025-09-08

#ML

This study explores why and when AI models like transformers make up information (hallucinate) by using sparse autoencoders to understand their concept representation. They found that as input information becomes more disorganized, transformers are more likely to create coherent but irrelevant concepts, leading to hallucinations. The research has implications for AI safety, aligning models with human values, and understanding potential adversarial attacks....

Reconstruction Alignment Improves Unified Multimodal Models

Published at 2025-09-08

#ML

The study presents a new method called Reconstruction Alignment that enhances unified multimodal models by using visual understanding encoder embeddings as 'text prompts' for image reconstruction. This simple and resource-efficient technique improves image generation and editing fidelity across various model architectures, outperforming larger open-source models with only 27 GPU-hours of training....

Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

Published at 2025-09-08

#ML

The authors present SEELE, a new framework for enhancing the reasoning capabilities of large language models. SEELE dynamically adjusts problem difficulty by adding hints to training samples, improving exploration efficiency and outperforming previous methods on various math reasoning benchmarks....

UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Published at 2025-09-08

#ML

The authors present a new framework called UMO that improves the consistency and scalability of preserving multiple identities in image customization, addressing a significant challenge in current methods. UMO uses a 'multi-to-multi matching' approach, reformulating identity generation as an optimization problem and reducing identity confusion through reinforcement learning on diffusion models, outperforming existing open-source methods....

Language Self-Play For Data-Free Training

Published at 2025-09-09

#ML

This research presents a new reinforcement learning method that allows language models to improve without needing more data, using a self-play game-theoretic framework called Language Self-Play. The method was tested on Llama-3.2-3B-Instruct and was found to enhance model performance on challenging tasks more effectively than traditional data-driven approaches....

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Published at 2025-09-09

#ML

This study presents Mini-o3, a system that significantly improves visual search performance by scaling up tool-based interactions and executing deep, multi-turn reasoning. The system's success is attributed to a new dataset of complex visual search problems, an iterative data collection pipeline, and an over-turn masking strategy that enables efficient training and scalable inference, resulting in rich reasoning patterns and improved accuracy....

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Published at 2025-09-09

#ML

The authors present a reinforcement learning framework, Parallel-R1, which enables parallel thinking for large language models in complex reasoning tasks. This framework uses a progressive curriculum that starts with supervised fine-tuning on easier tasks to develop parallel thinking ability, then switches to reinforcement learning for exploring and generalizing this skill on more difficult problems. Experiments show that this approach leads to improved performance and a shift in the model's thi...

SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge

Published at 2025-09-09

#ML

Researchers created a more reliable and challenging benchmark called SimpleQA Verified for evaluating Large Language Models' factuality, addressing issues in the original SimpleQA benchmark. Gemini 2.5 Pro achieved the best results on this new benchmark, outperforming other advanced models, and the dataset and evaluation code are available for the research community....

Visual Representation Alignment for Multimodal Large Language Models

Published at 2025-09-09

#ML

The authors propose a method called VIRAL to improve multimodal large language models in vision-centric tasks by aligning their internal visual representations with those of pre-trained vision models, enabling better retention and utilization of visual details, which leads to consistent performance improvements across various multimodal benchmarks....

ΔL Normalization: Rethink Loss Aggregation in RLVR

Published at 2025-09-09

#ML

The study presents Delta L Normalization, a new method to handle varying response lengths in Reinforcement Learning with Verifiable Rewards (RLVR), which helps stabilize the training process and improve performance of large language models. The method reduces gradient variance and provides unbiased estimates of the true policy loss, outperforming previous techniques in various experiments....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages