🤗 Daily Paper(2025-12-25)

0 views
Skip to first unread message

deep.di...@gmail.com

unread,
Dec 25, 2025, 3:06:41 PM (11 days ago) 12/25/25
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Published at 2025-12-17

#ML

TurboDiffusion is a framework that significantly speeds up video generation by using efficient attention computation, step distillation, and quantization, resulting in a 100-200x faster process while maintaining video quality, as demonstrated in various models....

Read Moreicon

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

Published at 2025-12-20

#ML

This study presents a new benchmark called SWE-EVO for evaluating AI coding agents in long-term software development scenarios, which involves multiple steps and files. The results show that current AI models, including GPT-5, struggle with these complex tasks compared to single-issue tasks, highlighting the need for improved AI capabilities in sustained, multi-file reasoning....

Read Moreicon

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Published at 2025-12-23

#ML

This study presents a new method for improving dynamic spatial reasoning in vision-language models by introducing a suite called DSR Suite. The suite includes an automated pipeline for generating multiple-choice questions from videos, a lightweight Geometry Selection Module to integrate geometric information into the models, and a new dataset that emphasizes real-world video sources and detailed 3D information. The result is a significant enhancement in the dynamic spatial reasoning capabilities...

Read Moreicon

Multi-hop Reasoning via Early Knowledge Alignment

Published at 2025-12-23

#ML

This research presents Early Knowledge Alignment (EKA), a module that enhances iterative Retrieval-Augmented Generation (RAG) systems by aligning Large Language Models (LLMs) with contextually relevant retrieved knowledge before planning. EKA improves retrieval precision, reduces errors, and boosts performance and efficiency by focusing on relevant information subsets, making it a versatile and scalable strategy for large models....

Read Moreicon

NVIDIA Nemotron 3: Efficient and Open Intelligence

Published at 2025-12-23

#ML

The Nemotron 3 family of AI models, including Nano, Super, and Ultra, offers strong agentic, reasoning, and conversational abilities. These models use a unique architecture and training methods to achieve high performance and efficiency, with Nano being cost-effective for inference, Super for collaboration and high-volume tasks, and Ultra for top-tier accuracy and reasoning....

Read Moreicon

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Published at 2025-12-23

#ML

Nemotron 3 Nano is a new language model that's more efficient and accurate than its predecessor. It can process longer texts and perform better in various tasks, making it a powerful tool for understanding and generating human-like text....

Read Moreicon

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

Published at 2025-12-23

#ML

The study presents TokSuite, a set of models and a benchmark designed to investigate the impact of tokenizers on language models' performance and behavior. By training fourteen models with different tokenizers but identical architecture, dataset, training budget, and initialization, and creating a new benchmark for measuring model performance under real-world perturbations, TokSuite enables researchers to understand the advantages and limitations of various popular tokenizers....

Read Moreicon

Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

Published at 2025-12-24

#ML

This study reveals that advanced vision-language models perform better on well-known buildings due to memorization, not understanding. To test this, they created YearGuessr, a large dataset of building images with information like construction year, location, and popularity, and found that these models struggle with less popular subjects, highlighting a major issue in their reasoning skills....

Read Moreicon

DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Published at 2025-12-24

#ML

The authors present DreaMontage, a framework that creates smooth, artistic one-shot videos using user-provided frames. It addresses challenges in visual quality, motion rationality, and transition smoothness, enabling the transformation of fragmented visuals into cohesive cinematic experiences....

Read Moreicon

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Published at 2025-12-24

#ML

The study presents an efficient framework named HiStream for high-resolution video generation, which significantly reduces computational complexity and inference time by eliminating redundancy in spatial, temporal, and timestep dimensions. HiStream outperforms existing models in terms of speed and visual quality, making high-resolution video generation practical and scalable for digital media and film applications....

Read Moreicon

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Published at 2025-12-24

#ML

The study presents a new framework called Competitive Swiss-System Dynamics (CSD) to evaluate large language models (LLMs) more effectively. CSD simulates a series of rounds where models compete in a sequence of benchmarks, and it uses statistical methods to determine their expected win score, taking into account their vulnerability in sequential tasks. This approach provides a more nuanced ranking than traditional methods, helping to distinguish between generalist and specialist models based on...

Read Moreicon

Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations

Published at 2025-12-24

#ML

The authors present NExT-Vid, a new method for training visual models by predicting the next frame in a video, which improves upon previous methods by separating semantic information from target decoding and enhancing generation quality. Experiments show that NExT-Vid outperforms other visual representation learning methods in downstream tasks....

Read Moreicon

PhononBench:A Large-Scale Phonon-Based Benchmark for Dynamical Stability in Crystal Generation

Published at 2025-12-24

#ML

The authors present PhononBench, a large-scale benchmark for testing the stability of AI-generated crystals. They find that many generated crystals are unstable, with an average stability rate of only 25.83%, and identify 28,119 stable crystal structures, providing a valuable resource for future materials exploration....

Read Moreicon

Streaming Video Instruction Tuning

Published at 2025-12-24

#ML

The authors created a new AI model called Streamo, which can understand and interact with real-time video content in various ways, like providing narration or answering questions. They trained Streamo using a large dataset with diverse video tasks, enabling it to reason about time and adapt to different scenarios, making it a significant step towards unified video understanding....

Read Moreicon

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Published at 2025-12-24

#ML

The study presents T2AV-Compass, a new benchmark for evaluating text-to-audio-video generation systems, which includes 500 complex prompts and a dual-level evaluation framework. The benchmark reveals that current models struggle with audio realism, synchronization, and following instructions, highlighting the need for improvement in this field....

Read Moreicon

Published at

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages