🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training |
Published at 2025-12-15 |
|
#ML
|
This study presents GTR-Turbo, an efficient method for training multi-modal agents that uses a merged checkpoint as a free teacher instead of relying on costly models. GTR-Turbo improves accuracy, reduces training time, and lowers compute cost compared to previous methods, making it more practical and accessible for building vision-language models.... |
Read More |
|
|
|
![]() |
Spatia: Video Generation with Updatable Spatial Memory |
Published at 2025-12-17 |
|
#ML
|
The authors present a new video generation framework named Spatia that maintains a 3D scene point cloud to improve long-term spatial and temporal consistency in video generation. Spatia's design allows for realistic dynamic entities and enables applications like camera control and 3D-aware interactive editing.... |
Read More |
|
|
|
|
![]() |
How Much 3D Do Video Foundation Models Encode? |
Published at 2025-12-22 |
|
#ML
|
This study measures the 3D understanding of video foundation models by estimating 3D properties from their features. The results show that advanced video generation models have a strong grasp of 3D objects and scenes, comparable to expert models trained specifically for 3D tasks, without any direct 3D training.... |
Read More |
|
|
|
![]() |
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models |
Published at 2025-12-22 |
|
#ML
|
This study uses a theory called Episode Theory and a framework called ThinkARM to analyze the reasoning process of language models in solving mathematical problems. The analysis reveals important differences between models that can reason and those that cannot, and shows how certain steps in the reasoning process are related to correctness and efficiency.... |
Read More |
|
|
|
|
![]() |
VA-π: Variational Policy Alignment for Pixel-Aware Autoregressive Generation |
Published at 2025-12-22 |
|
#ML
|
The paper presents VA-pi, a new method that improves the quality of images generated by autoregressive visual models by aligning them with tokenizers using a variational optimization approach and reinforcement learning. This method provides direct pixel-level guidance to the model, resulting in better image reconstruction and higher image quality metrics without requiring additional data or complex training processes.... |
Read More |
|
|
|
![]() |
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning |
Published at 2025-12-23 |
|
#ML
|
This research presents a method to improve learning efficiency in autoregressive models by enabling them to act and explore using their internal representations. The proposed approach, called 'internal RL', allows the model to generate actions from a higher-order, non-causal sequence model, which in turn controls the residual stream activations of the base autoregressive model. This technique helps the model learn from sparse rewards and perform hierarchical reinforcement learning on complex tas... |
Read More |
|
|
|
|
![]() |
Latent Implicit Visual Reasoning |
Published at 2025-12-24 |
|
#ML
|
The authors present a new method to improve visual reasoning in large multimodal models, which are mostly text-based, by introducing task-agnostic tokens that help the model understand and use visual information without needing specific supervision. This approach performs better than regular fine-tuning and sets new standards for various vision-centric tasks without relying on hand-crafted supervision.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|