🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning |
Published at 2025-09-16 |
#ML
|
FinSearchComp is a new benchmark for testing financial search and reasoning skills of artificial intelligence agents. It simulates real-world tasks performed by financial analysts and evaluates the performance of 21 different AI models, with Grok 4 (web) and DouBao (web) being the top performers for global and Greater China markets, respectively.... |
Read More |
|
|
![]() |
AToken: A Unified Tokenizer for Vision |
Published at 2025-09-17 |
#ML
|
The authors have developed a unified visual tokenizer, AToken, that can handle various visual inputs like images, videos, and 3D assets, providing both high-quality reconstruction and semantic understanding. This tokenizer uses a transformer architecture with unique position embeddings, allowing it to process diverse visual data in a shared latent space, and outperforms existing tokenizers in generating and understanding various visual content.... |
Read More |
|
|
|
![]() |
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation |
Published at 2025-09-18 |
#ML
|
The authors propose a new method called EVOL-RL that improves language models without labels, focusing on maintaining exploration and diversity in model generations. This method prevents 'entropy collapse' and enhances model performance in various tasks, outperforming existing label-free methods and even improving results in the RLVR setting.... |
Read More |
|
|
![]() |
FlowRL: Matching Reward Distributions for LLM Reasoning |
Published at 2025-09-18 |
#ML
|
The study presents a new approach called FlowRL for reinforcement learning in large language models, which focuses on matching the full reward distribution through flow balancing rather than just maximizing rewards. This method promotes diverse exploration and generalizable reasoning trajectories, outperforming existing methods on math and code reasoning tasks by a significant margin.... |
Read More |
|
|
|
![]() |
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration |
Published at 2025-09-18 |
#ML
|
The study presents Align3, a new method that uses Test-Time Deliberation to help large language models follow customized user or organizational specifications more effectively. The researchers also introduce SpecBench, a benchmark to measure this alignment, and find that test-time deliberation improves alignment, advances the safety-helpfulness trade-off, and reveals alignment gaps.... |
Read More |
|
|
![]() |
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation |
Published at 2025-09-18 |
#ML
|
The researchers developed a new model named RynnVLA-001 that learns from human demonstration videos to improve robot manipulation tasks. It uses a two-step training process to predict future actions and compresses these actions into a simpler form, resulting in better performance compared to existing models in robotics tasks.... |
Read More |
|
|
|
![]() |
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data |
Published at 2025-09-18 |
#ML
|
The authors present ScaleCUA, a large-scale, open-source dataset for training computer use agents to operate GUIs across multiple platforms and tasks. They demonstrate significant improvements over baseline models on various benchmarks, highlighting the importance of data-driven scaling for general-purpose computer use agents.... |
Read More |
|
|
![]() |
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation |
Published at 2025-09-18 |
#ML
|
This study investigates the challenges of applying next-token prediction to image generation with autoregressive models and introduces a new training framework, ST-AR, which significantly improves image understanding and generation quality by using self-supervised objectives to address these challenges, resulting in substantial FID improvements for LlamaGen models.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|