🤗 Daily Paper(2025-12-29)

8 views

Skip to first unread message

deep.di...@gmail.com

unread,

Dec 29, 2025, 10:33:51 AM12/29/25

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

Published at 2025-12-18

#Deep Learning , #Natural Language Processing

The study presents a new method called MiA-RAG that helps Large Language Models (LLMs) understand long and complex texts better by using a global view of the content, similar to how humans do. This method improves the ability of LLMs to retrieve and reason over relevant information in long documents, outperforming existing systems in various tests....

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Published at 2025-12-19

#Deep Learning , #Computer Vision

The study presents InsertAnywhere, a new framework for realistic video object insertion that handles scene geometry, occlusion, and lighting effects better than existing methods. It uses a 4D aware mask generation module and a diffusion-based video generation model, trained with a new synthetic dataset called ROSE++, to create visually coherent and geometrically plausible object insertions in various real-world scenarios....

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Published at 2025-12-21

#Deep Learning , #Natural Language Processing , #Computer Vision , #Emerging Applications of Machine Learning

The study presents a new benchmark, O3-Bench, to evaluate multimodal reasoning in AI agents, which requires them to analyze visual details in images through multi-step reasoning. To enhance this capability, the researchers developed InSight-o3, a framework that includes a visual search agent capable of locating complex regions described in natural language, improving the performance of state-of-the-art multimodal models on various benchmarks....

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

Published at 2025-12-23

#Supervised Learning , #Natural Language Processing , #Human-Computer Interaction (HCI) and User Interfaces , #Emerging Applications of Machine Learning

The study presents SlideTailor, a new system for creating customized presentation slides from scientific papers. It uses a user's example paper-slide pair and visual template to understand their preferences without needing detailed textual input, resulting in higher quality slides and enabling applications like video presentations....

SVBench: Evaluation of Video Generation Models on Social Reasoning

Published at 2025-12-24

#Deep Learning , #Computer Vision

The study presents SVBench, a new benchmark for assessing the social reasoning abilities of video generation models. By organizing thirty social cognition paradigms into seven core dimensions, the benchmark evaluates models' performance in areas like mental-state inference, goal-directed action, and prosocial behavior. Results show that while modern models perform well in visual realism, they struggle with understanding intentions, beliefs, and social norms....

Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding

Published at 2025-12-25

#Deep Learning , #Explainable AI and Interpretability , #Time Series Analysis and Forecasting , #Computer Vision

The authors present Omni-Weather, a new model that combines weather prediction and explanation in one system. By integrating a radar encoder and a shared self-attention mechanism, Omni-Weather outperforms existing methods in both weather generation and understanding, and shows that these tasks can benefit from each other....

TimeBill: Time-Budgeted Inference for Large Language Models

Published at 2025-12-25

#Deep Learning , #Natural Language Processing , #Optimization and Learning Algorithms , #Robotics and Control , #Emerging Applications of Machine Learning

The authors present a new framework called TimeBill that helps large language models generate accurate responses within a specific time limit. TimeBill uses prediction techniques to estimate execution time and adaptively adjusts cache eviction ratios, improving task completion rates and response performance in time-critical systems like robotics and autonomous driving....

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Published at 2025-12-25

#Computer Vision , #Deep Learning , #Natural Language Processing , #Emerging Applications of Machine Learning

The researchers created a new system called UniPercept-Bench that helps large language models better understand images on a perceptual level, focusing on aesthetics, quality, structure, and texture. They built a hierarchy to define these features and gathered large datasets to test the system. UniPercept, a strong baseline, was developed using this framework and outperforms existing models in image understanding, which can also be used to improve text-to-image generation....

A 58-Addition, Rank-23 Scheme for General 3x3 Matrix Multiplication

Published at 2025-12-26

#Optimization and Decision Making

The authors have developed an efficient algorithm for multiplying 3x3 matrices over non-commutative rings, which performs the multiplication using only 58 additions, outperforming the previous best method with the same rank but fewer additions. This new method is both efficient and versatile, working across various fields....

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Published at 2025-12-26

#Deep Learning , #Computer Vision , #Natural Language Processing , #Human-Computer Interaction (HCI) and User Interfaces , #Emerging Applications of Machine Learning

The authors present MAI-UI, a family of foundation GUI agents that address challenges in realistic deployment, such as native agent-user interaction and dynamic environment brittleness, using a unified methodology. MAI-UI outperforms other models on various benchmarks for GUI grounding and mobile navigation, demonstrating significant improvements in performance and efficiency....

ProEdit: Inversion-based Editing From Prompts Done Right

Published at 2025-12-26

#Computer Vision , #Natural Language Processing , #Deep Learning

The study presents a new method called ProEdit that improves upon existing inversion-based visual editing techniques. It addresses the issue of over-reliance on source information during editing by introducing KV-mix and Latents-Shift, which enhance editing consistency and eliminate the influence of inverted latent on sampling, resulting in state-of-the-art performance on various image and video editing benchmarks....

SWE-RM: Execution-free Feedback For Software Engineering Agents

Published at 2025-12-26

#Supervised Learning , #Reinforcement Learning , #Emerging Applications of Machine Learning

This study explores execution-free feedback for software engineering agents, focusing on developing robust reward models that perform well in both test-time scaling and reinforcement learning. The researchers identify key factors for RL training, such as classification accuracy and calibration, and introduce SWE-RM, a new reward model that significantly enhances agent performance in both areas....

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Published at 2025-12-26

#Deep Learning , #Natural Language Processing , #Computer Vision

The authors present a new method called Bi-directional Perceptual Shaping (BiPS) that helps vision-language models better utilize visual evidence by transforming question-conditioned masked views into signals that shape perception. BiPS encourages models to focus on relevant visual details and avoid relying solely on textual information, leading to improved performance and generalization across various benchmarks....

Tags are generated by Google's Gemini 2.5 Flash, and the summary is generated by Upstage's Solar Pro API.

Visit Developer's Social Media