🤗 Daily Paper(2025-12-26)

0 views

Skip to first unread message

deep.di...@gmail.com

unread,

Dec 26, 2025, 3:06:46 PM12/26/25

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

Published at 2025-12-15

#ML

This study presents GTR-Turbo, an efficient method for training multi-modal agents that uses a merged checkpoint as a free teacher instead of relying on costly models. GTR-Turbo improves accuracy, reduces training time, and lowers compute cost compared to previous methods, making it more practical and accessible for building vision-language models....

Spatia: Video Generation with Updatable Spatial Memory

Published at 2025-12-17

#ML

The authors present a new video generation framework named Spatia that maintains a 3D scene point cloud to improve long-term spatial and temporal consistency in video generation. Spatia's design allows for realistic dynamic entities and enables applications like camera control and 3D-aware interactive editing....

How Much 3D Do Video Foundation Models Encode?

Published at 2025-12-22

#ML

This study measures the 3D understanding of video foundation models by estimating 3D properties from their features. The results show that advanced video generation models have a strong grasp of 3D objects and scenes, comparable to expert models trained specifically for 3D tasks, without any direct 3D training....

Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Published at 2025-12-22

#ML

This study uses a theory called Episode Theory and a framework called ThinkARM to analyze the reasoning process of language models in solving mathematical problems. The analysis reveals important differences between models that can reason and those that cannot, and shows how certain steps in the reasoning process are related to correctness and efficiency....

VA-π: Variational Policy Alignment for Pixel-Aware Autoregressive Generation

Published at 2025-12-22

#ML

The paper presents VA-pi, a new method that improves the quality of images generated by autoregressive visual models by aligning them with tokenizers using a variational optimization approach and reinforcement learning. This method provides direct pixel-level guidance to the model, resulting in better image reconstruction and higher image quality metrics without requiring additional data or complex training processes....

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Published at 2025-12-23

#ML

This research presents a method to improve learning efficiency in autoregressive models by enabling them to act and explore using their internal representations. The proposed approach, called 'internal RL', allows the model to generate actions from a higher-order, non-causal sequence model, which in turn controls the residual stream activations of the base autoregressive model. This technique helps the model learn from sparse rewards and perform hierarchical reinforcement learning on complex tas...

Latent Implicit Visual Reasoning

Published at 2025-12-24

#ML

The authors present a new method to improve visual reasoning in large multimodal models, which are mostly text-based, by introducing task-agnostic tokens that help the model understand and use visual information without needing specific supervision. This approach performs better than regular fine-tuning and sets new standards for various vision-centric tasks without relying on hand-crafted supervision....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages