🤗 Daily Paper(2025-08-19)

4 views
Skip to first unread message

deep.di...@gmail.com

unread,
Aug 19, 2025, 4:06:48 PMAug 19
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Published at 2025-08-13

#ML

This survey explores various innovative architectures for Large Language Models (LLMs) that improve efficiency and address the limitations of traditional transformer models, which are resource-intensive. Topics covered include linear and sparse sequence modeling, efficient attention variants, hybrid models, and emerging diffusion LLMs, with potential applications to other modalities for creating scalable, resource-aware AI systems....

Read Moreicon

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning

Published at 2025-08-14

#ML

The ComoRAG model is introduced to improve narrative reasoning over long stories by using a dynamic, iterative process that mimics human cognition. This approach outperforms traditional RAG methods, especially for complex queries, on various long-context narrative benchmarks....

Read Moreicon

Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information

Published at 2025-08-15

#ML

The study presents a new dataset to evaluate large reasoning models' ability to ask for information in incomplete problems, revealing their limitations in doing so and uncovering issues like overthinking and hallucination. The research aims to improve the development of these models towards genuine intelligence....

Read Moreicon

G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

Published at 2025-08-15

#ML

The authors present a new method called G-CUT3R that improves 3D scene reconstruction by using additional data like depth or camera information, making it more accurate than other methods that only use input images. This approach is flexible and can be used with different types of prior information, leading to better performance in various 3D reconstruction tasks....

Read Moreicon

Ovis2.5 Technical Report

Published at 2025-08-15

#ML

Ovis2.5 is a new model that improves visual perception and reasoning compared to its predecessor, Ovis2. It uses a native-resolution vision transformer for better image processing and advanced reasoning capabilities like self-checking and revision, which can be activated for improved accuracy on tough tasks. The model is trained using a five-phase curriculum and two open-source versions, Ovis2.5-9B and Ovis2.5-2B, are available. Ovis2.5-9B sets a new state-of-the-art for open-source models with ...

Read Moreicon

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

Published at 2025-08-15

#ML

The researchers created a two-step model called AuriStream that mimics human hearing to understand speech. It first converts sound into a time-frequency format like the human ear, then uses a predictive model. AuriStream performs well on various speech tasks and can generate sound continuations, offering insights into its predictions....

Read Moreicon

When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs

Published at 2025-08-15

#ML

This study evaluates 5 methods to make Large Language Models less sensitive to minor prompt variations, testing them on various models and tasks. The results help determine which methods work best, aiding in the development of more reliable LLMs for real-world use....

Read Moreicon

Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping

Published at 2025-08-17

#ML

The study presents Inverse-LLaVA, a new method for multimodal learning that doesn't require expensive alignment pre-training. It maps text embeddings into visual representation space, reducing computational requirements by 45% and improving reasoning-intensive tasks while decreasing performance on perception tasks that rely on memorized associations....

Read Moreicon

4DNeX: Feed-Forward 4D Generative Modeling Made Easy

Published at 2025-08-18

#ML

The authors create a new method called 4DNeX, which uses a pretrained video model to generate dynamic 3D scenes from a single image more efficiently than existing methods. They also build a large dataset with high-quality 4D annotations and propose strategies to adapt pretrained models for 4D modeling, resulting in high-quality dynamic point clouds for novel-view video synthesis....

Read Moreicon

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Published at 2025-08-18

#ML

This study examines the spatial intelligence of GPT-5 and other models, finding that while GPT-5 shows great improvement, it still lags behind human performance in various spatial tasks. The research also identifies the toughest spatial challenges for these models and finds that proprietary models don't have a clear edge in these difficult tasks....

Read Moreicon

HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds

Published at 2025-08-18

#ML

The researchers present a new benchmark called HeroBench, which tests long-term planning and structured thinking in complex video game-like environments. They find that many advanced language models struggle with these tasks, especially when it comes to creating solid plans and executing actions reliably, highlighting areas for future research....

Read Moreicon

Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

Published at 2025-08-18

#ML

The study presents Lumen, a video relighting framework that uses large-scale video generative models to replace backgrounds while adjusting lighting in the foreground. The framework is trained using a combination of realistic and synthetic videos, resulting in consistent lighting and preserved foreground properties, as demonstrated in experimental results....

Read Moreicon

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

Published at 2025-08-18

#ML

The authors present Matrix-Game 2.0, a real-time, streaming interactive world model that generates high-quality videos quickly by using a few-step auto-regressive diffusion process. This model is built with a scalable data production pipeline, an action injection module for frame-level inputs, and a few-step distillation based on the causal architecture, enabling it to simulate complex physical dynamics and interactive behaviors in real-time....

Read Moreicon

Next Visual Granularity Generation

Published at 2025-08-18

#ML

The researchers present a new method for generating images by breaking them down into a series of elements with the same size but varying levels of detail. They then use a framework called Next Visual Granularity (NVG) to create images by progressively adding detail, which allows for better control over the image generation process. Their method outperforms a previous one in terms of image quality and will be made available for others to use....

Read Moreicon

Precise Action-to-Video Generation Through Visual Action Prompts

Published at 2025-08-18

#ML

The authors propose using visual skeletons as a unified representation for generating precise and transferable action-driven videos across different domains, such as human-object interactions and robotic manipulation. They demonstrate the effectiveness of their approach through experiments on various datasets and a project page for further information....

Read Moreicon

Reinforcement Learning with Rubric Anchors

Published at 2025-08-18

#ML

The study presents a new method for training large language models on open-ended tasks by using rubric-based rewards, which are structured criteria for evaluating subjective outputs. The proposed approach allows for fine-grained stylistic control and improved performance on open-ended benchmarks, even outperforming larger models, while using significantly fewer training samples....

Read Moreicon

S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

Published at 2025-08-18

#ML

The study finds that a common method for improving diffusion models, called Classifier-free Guidance, often produces suboptimal results, leading to lower quality outputs. The researchers then propose a new method, S^2-Guidance, which uses random selection of model parts to avoid these suboptimal predictions and improve output quality, outperforming the old method in various tasks....

Read Moreicon

Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods

Published at 2025-08-18

#ML

The paper presents a visual analytics system named Unlearning Comparator to help researchers better understand and compare different Machine Unlearning methods. This tool allows for model comparison and attack simulation, enabling users to evaluate the accuracy, efficiency, and privacy of various methods and gain insights for improving them....

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages