🤗 Daily Paper(2025-09-19)

2 views

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 19, 2025, 4:09:59 PMSep 19

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

Published at 2025-09-16

#ML

FinSearchComp is a new benchmark for testing financial search and reasoning skills of artificial intelligence agents. It simulates real-world tasks performed by financial analysts and evaluates the performance of 21 different AI models, with Grok 4 (web) and DouBao (web) being the top performers for global and Greater China markets, respectively....

AToken: A Unified Tokenizer for Vision

Published at 2025-09-17

#ML

The authors have developed a unified visual tokenizer, AToken, that can handle various visual inputs like images, videos, and 3D assets, providing both high-quality reconstruction and semantic understanding. This tokenizer uses a transformer architecture with unique position embeddings, allowing it to process diverse visual data in a shared latent space, and outperforms existing tokenizers in generating and understanding various visual content....

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Published at 2025-09-18

#ML

The authors propose a new method called EVOL-RL that improves language models without labels, focusing on maintaining exploration and diversity in model generations. This method prevents 'entropy collapse' and enhances model performance in various tasks, outperforming existing label-free methods and even improving results in the RLVR setting....

FlowRL: Matching Reward Distributions for LLM Reasoning

Published at 2025-09-18

#ML

The study presents a new approach called FlowRL for reinforcement learning in large language models, which focuses on matching the full reward distribution through flow balancing rather than just maximizing rewards. This method promotes diverse exploration and generalizable reasoning trajectories, outperforming existing methods on math and code reasoning tasks by a significant margin....

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Published at 2025-09-18

#ML

The study presents Align3, a new method that uses Test-Time Deliberation to help large language models follow customized user or organizational specifications more effectively. The researchers also introduce SpecBench, a benchmark to measure this alignment, and find that test-time deliberation improves alignment, advances the safety-helpfulness trade-off, and reveals alignment gaps....

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

Published at 2025-09-18

#ML

The researchers developed a new model named RynnVLA-001 that learns from human demonstration videos to improve robot manipulation tasks. It uses a two-step training process to predict future actions and compresses these actions into a simpler form, resulting in better performance compared to existing models in robotics tasks....

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Published at 2025-09-18

#ML

The authors present ScaleCUA, a large-scale, open-source dataset for training computer use agents to operate GUIs across multiple platforms and tasks. They demonstrate significant improvements over baseline models on various benchmarks, highlighting the importance of data-driven scaling for general-purpose computer use agents....

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

Published at 2025-09-18

#ML

This study investigates the challenges of applying next-token prediction to image generation with autoregressive models and introduces a new training framework, ST-AR, which significantly improves image understanding and generation quality by using self-supervised objectives to address these challenges, resulting in substantial FID improvements for LlamaGen models....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages