🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling |
Published at 2025-09-01 |
#ML
|
The authors present a new method called Q-Sched that reduces the size of few-step diffusion models by 4 times while maintaining their quality. This is achieved by adjusting the sampling trajectory of the models and introducing a new loss function called JAQ, which optimizes the models without requiring full-precision inference. The result is a more efficient and effective way to generate high-quality images using fewer computational resources.... |
Read More |
|
|
![]() |
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning |
Published at 2025-09-03 |
#ML
|
The study finds that reinforcement learning improves complex reasoning in language models by developing a hierarchy of skills, first focusing on low-level procedures and then on high-level strategies. The researchers propose a new algorithm, HICA, which focuses on high-impact planning tokens, outperforming existing algorithms and providing a better measure of strategic exploration.... |
Read More |
|
|
|
![]() |
Benchmarking Information Retrieval Models on Complex Retrieval Tasks |
Published at 2025-09-08 |
#ML
|
The study creates a diverse and realistic set of complex retrieval tasks to assess the capabilities of state-of-the-art retrieval models, which struggle to provide high-quality results even for the best models. The research also examines the effect of using large language models (LLMs) for query expansion and rewriting, finding that while LLMs can help weaker models, the strongest model sees reduced performance with all rewriting techniques.... |
Read More |
|
|
![]() |
Causal Attention with Lookahead Keys |
Published at 2025-09-08 |
#ML
|
The authors propose a new attention mechanism called CASTLE that updates token keys as the context unfolds, allowing them to incorporate later context while maintaining the autoregressive property. This method improves language modeling performance and enables efficient parallel training without explicit materialization of lookahead keys.... |
Read More |
|
|
|
![]() |
Curia: A Multi-Modal Foundation Model for Radiology |
Published at 2025-09-08 |
#ML
|
Researchers developed a comprehensive AI model, Curia, trained on a large dataset of hospital imaging exams, which outperforms existing models in various radiological tasks and demonstrates promising capabilities in low-data settings and across different imaging modalities.... |
Read More |
|
|
![]() |
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference |
Published at 2025-09-08 |
#ML
|
The study presents Direct-Align, a method that predefines a noise prior to recover original images from any time steps via interpolation, avoiding over-optimization in late timesteps. Additionally, Semantic Relative Preference Optimization (SRPO) is introduced, enabling online adjustment of rewards in response to positive and negative prompt augmentation, which reduces the need for offline reward fine-tuning.... |
Read More |
|
|
|
![]() |
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions |
Published at 2025-09-08 |
#ML
|
The authors present F1, a pretrained Vision-Language-Action framework that incorporates visual foresight generation into its decision-making process. F1 utilizes a Mixture-of-Transformer architecture and a next-scale prediction mechanism to forecast future visual states, allowing it to generate actions that implicitly achieve visual goals. The model is trained on a large dataset of diverse tasks, resulting in improved performance and generalization ability compared to existing approaches.... |
Read More |
|
|
![]() |
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers |
Published at 2025-09-08 |
#ML
|
This study explores why and when AI models like transformers make up information (hallucinate) by using sparse autoencoders to understand their concept representation. They found that as input information becomes more disorganized, transformers are more likely to create coherent but irrelevant concepts, leading to hallucinations. The research has implications for AI safety, aligning models with human values, and understanding potential adversarial attacks.... |
Read More |
|
|
|
![]() |
Reconstruction Alignment Improves Unified Multimodal Models |
Published at 2025-09-08 |
#ML
|
The study presents a new method called Reconstruction Alignment that enhances unified multimodal models by using visual understanding encoder embeddings as 'text prompts' for image reconstruction. This simple and resource-efficient technique improves image generation and editing fidelity across various model architectures, outperforming larger open-source models with only 27 GPU-hours of training.... |
Read More |
|
|
![]() |
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding |
Published at 2025-09-08 |
#ML
|
The authors present SEELE, a new framework for enhancing the reasoning capabilities of large language models. SEELE dynamically adjusts problem difficulty by adding hints to training samples, improving exploration efficiency and outperforming previous methods on various math reasoning benchmarks.... |
Read More |
|
|
|
![]() |
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward |
Published at 2025-09-08 |
#ML
|
The authors present a new framework called UMO that improves the consistency and scalability of preserving multiple identities in image customization, addressing a significant challenge in current methods. UMO uses a 'multi-to-multi matching' approach, reformulating identity generation as an optimization problem and reducing identity confusion through reinforcement learning on diffusion models, outperforming existing open-source methods.... |
Read More |
|
|
![]() |
Language Self-Play For Data-Free Training |
Published at 2025-09-09 |
#ML
|
This research presents a new reinforcement learning method that allows language models to improve without needing more data, using a self-play game-theoretic framework called Language Self-Play. The method was tested on Llama-3.2-3B-Instruct and was found to enhance model performance on challenging tasks more effectively than traditional data-driven approaches.... |
Read More |
|
|
|
![]() |
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search |
Published at 2025-09-09 |
#ML
|
This study presents Mini-o3, a system that significantly improves visual search performance by scaling up tool-based interactions and executing deep, multi-turn reasoning. The system's success is attributed to a new dataset of complex visual search problems, an iterative data collection pipeline, and an over-turn masking strategy that enables efficient training and scalable inference, resulting in rich reasoning patterns and improved accuracy.... |
Read More |
|
|
![]() |
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning |
Published at 2025-09-09 |
#ML
|
The authors present a reinforcement learning framework, Parallel-R1, which enables parallel thinking for large language models in complex reasoning tasks. This framework uses a progressive curriculum that starts with supervised fine-tuning on easier tasks to develop parallel thinking ability, then switches to reinforcement learning for exploring and generalizing this skill on more difficult problems. Experiments show that this approach leads to improved performance and a shift in the model's thi... |
Read More |
|
|
|
![]() |
SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge |
Published at 2025-09-09 |
#ML
|
Researchers created a more reliable and challenging benchmark called SimpleQA Verified for evaluating Large Language Models' factuality, addressing issues in the original SimpleQA benchmark. Gemini 2.5 Pro achieved the best results on this new benchmark, outperforming other advanced models, and the dataset and evaluation code are available for the research community.... |
Read More |
|
|
![]() |
Visual Representation Alignment for Multimodal Large Language Models |
Published at 2025-09-09 |
#ML
|
The authors propose a method called VIRAL to improve multimodal large language models in vision-centric tasks by aligning their internal visual representations with those of pre-trained vision models, enabling better retention and utilization of visual details, which leads to consistent performance improvements across various multimodal benchmarks.... |
Read More |
|
|
|
![]() |
ΔL Normalization: Rethink Loss Aggregation in RLVR |
Published at 2025-09-09 |
#ML
|
The study presents Delta L Normalization, a new method to handle varying response lengths in Reinforcement Learning with Verifiable Rewards (RLVR), which helps stabilize the training process and improve performance of large language models. The method reduces gradient variance and provides unbiased estimates of the true policy loss, outperforming previous techniques in various experiments.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|