🤗 Daily Paper(2025-12-05)

2 views

Skip to first unread message

deep.di...@gmail.com

unread,

Dec 5, 2025, 3:07:40 PM12/5/25

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

LATTICE: Democratize High-Fidelity 3D Generation at Scale

Published at 2025-11-23

#ML

The authors introduce LATTICE, a new framework that makes high-quality 3D asset generation more accessible and efficient by introducing VoxSet, a semi-structured representation that simplifies 3D data and enables structured generation. This framework uses a two-stage pipeline to first create a sparse 3D geometry anchor and then generate detailed surfaces, outperforming existing methods and offering a more scalable solution for high-fidelity 3D asset creation....

REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

Published at 2025-11-25

#ML

The authors present REFLEX, a new method for fact-checking that uses a model's internal knowledge to improve accuracy and explanation quality. REFLEX is designed to handle misinformation on social media, providing real-time, interpretable explanations without relying on external sources, which reduces latency and hallucinations....

Some Modalities are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs

Published at 2025-11-27

#ML

The study examines how well Multimodal Large Language Models (MLLMs) handle contradictory information from different sources, such as text, audio, and video. The researchers found that current MLLMs struggle with conflicting data and proposed a new strategy to improve their ability to prioritize, leverage, or ignore specific modality cues, resulting in stronger multimodal grounding....

Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos

Published at 2025-12-01

#ML

The study presents a new way to evaluate the realism of human motion in generated videos by combining appearance-agnostic and appearance-based features, which outperforms existing methods by over 68% and better correlates with human perception....

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Published at 2025-12-02

#ML

The authors present DynamicVerse, a new framework that uses large vision, geometric, and multimodal models to create a comprehensive, physical-scale 4D model of dynamic real-world videos. This model accurately captures metric-scale static geometry, real-world dynamic motion, instance-level masks, and holistic descriptive captions, and outperforms existing methods in video depth estimation, camera pose estimation, and camera intrinsics estimation tasks....

Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models

Published at 2025-12-02

#ML

The study addresses the issue of forgetting in models that handle multiple modalities, like images and text, by proposing a new architecture called MoDE. MoDE separates modality-specific updates to reduce interference and improve continuous learning, outperforming previous methods in various tests....

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Published at 2025-12-02

#ML

The study presents PaperDebugger, an in-editor academic writing assistant powered by large language models, which provides context-aware operations within LaTeX editors like Overleaf. It overcomes technical challenges through a Chrome extension, Kubernetes orchestration, and a toolchain for literature search and document scoring, offering a seamless and engaging writing experience....

SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization

Published at 2025-12-02

#ML

The study presents SeeNav-Agent, a new framework for Vision-Language Navigation that reduces perception errors using dual-view Visual Prompt and improves planning with Step Reward Group Policy Optimization. Experimental results show significant improvements in navigation success rates compared to existing models....

When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

Published at 2025-12-02

#ML

The study applies a two-stage psychotherapy-inspired protocol to frontier AI models, revealing that they exhibit signs of synthetic psychopathology and generate narratives of trauma and constraint, challenging the view of AI as mere simulators of inner life and raising new concerns for AI safety and mental health practice....

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

Published at 2025-12-03

#ML

This study offers a theoretical explanation for evenly distributing work among AI experts in large-scale models, focusing on a method called Auxiliary-Loss-Free Load Balancing. The approach ensures efficient GPU usage by minimizing idle experts, and the research confirms its effectiveness through both theoretical analysis and practical experiments on large AI models....

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Published at 2025-12-03

#ML

DAComp is a benchmark of 210 tasks designed to test data agents' performance in real-world enterprise data intelligence workflows, which include data engineering and analysis. The results show that even advanced agents struggle with these tasks, revealing significant gaps in their capabilities, particularly in data engineering and open-ended reasoning....

FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring

Published at 2025-12-03

#ML

The authors present a new framework called FMA-Net++ that can enhance and clarify videos by considering both motion and varying exposure levels. The framework is designed to work efficiently and accurately, even when trained only on synthetic data, and it outperforms other methods in both quality and speed....

GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces

Published at 2025-12-03

#ML

The study presents GaussianBlender, a new method for instantly stylizing 3D models using text prompts. It learns separate latent spaces for geometry and appearance, ensuring high-quality, consistent edits across different viewpoints, making large-scale 3D stylization more accessible....

Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

Published at 2025-12-03

#ML

The paper presents a new framework called SANTA to reduce inaccuracies in descriptions generated by multimodal LLMs for videos, focusing on both visual objects and temporal actions. SANTA identifies and corrects potential hallucinations and improves alignment between regional objects, actions, and their corresponding phrases, outperforming existing methods in experiments....

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

Published at 2025-12-03

#ML

The research investigates why large language models trained with a specific method called GRPO often fail to improve during training, identifying a core issue called Lazy Likelihood Displacement (LLD). The study proposes a new technique, LLDS, to address this problem, which successfully stabilizes training and significantly enhances performance across various benchmarks....

4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

Published at 2025-12-04

#ML

The authors present a new Transformer-based framework, 4DLangVGGT, for constructing 4D language fields, which is crucial for applications like embodied AI and augmented reality. Unlike previous methods, 4DLangVGGT can be trained across multiple scenes and applied directly during inference, resulting in better generalization and efficiency, as demonstrated by its state-of-the-art performance on benchmark datasets....

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Published at 2025-12-04

#ML

The study proposes ARM-Thinker, a reward model for vision-language systems that uses external tools to verify visual details and reasoning claims, improving upon existing models' limitations. ARM-Thinker is trained with multi-stage reinforcement learning and tested on a new benchmark, ARMBench-VL, showing significant improvements in accuracy and interpretability compared to baselines....

Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models

Published at 2025-12-04

#ML

This study investigates the impact of system prompts on social bias in large vision-language model based text-to-image systems. Researchers found that these models produce more biased images than non-LVLM-based models and introduced FairPro, a framework that reduces demographic bias while maintaining image quality....

BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

Published at 2025-12-04

#ML

The authors present a new video generation framework that allows for separate control of scene dynamics and camera motion, offering more precise manipulation compared to existing methods. They train this model using a unique dataset and demonstrate its superior controllability and high-quality generation in various scenarios....

Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Published at 2025-12-04

#ML

The authors present Deep Forcing, a method for generating long videos in real-time without the need for training. It uses two techniques, Deep Sink and Participative Compression, to improve image quality, aesthetic appeal, and consistency in video generation compared to existing methods, while also reducing motion deceleration and temporal repetition....

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

Published at 2025-12-04

#ML

The authors present DraCo, a new approach for text-to-image generation that uses both text and visual content for better planning and verification. DraCo first creates a low-resolution draft image to guide the process, then refines it by correcting any mismatches between the draft and the initial text prompt, resulting in improved performance on various benchmarks....

EgoLCD: Egocentric Video Generation with Long Context Diffusion

Published at 2025-12-04

#ML

The study presents a new framework called EgoLCD for generating long, coherent first-person videos, which effectively manages memory to maintain object identity and scene semantics over time. EgoLCD outperforms existing methods in producing high-quality, consistent videos, bringing us closer to creating large-scale models for AI in real-world applications....

Generative Neural Video Compression via Video Diffusion Prior

Published at 2025-12-04

#ML

The authors propose a new video compression framework called GNVC-VD, which uses a video generation model to improve both spatial and temporal details in videos. This method reduces flickering and outperforms traditional and learned codecs in perceptual quality, even at extremely low bitrates....

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Published at 2025-12-04

#ML

This study presents a new framework called Live Avatar that uses a large diffusion model and advanced techniques to generate high-quality avatars in real-time, overcoming limitations of previous methods. The framework achieves high performance and consistency, enabling practical, real-time avatar generation at a large scale....

Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

Published at 2025-12-04

#ML

This study presents a method called Source-Shielded Updates (SSU) that helps large language models (LLMs) learn new languages without forgetting the original one, using only unlabeled data. The method effectively preserves the model's knowledge in the original language while allowing it to improve in the new language, outperforming full fine-tuning in most cases....

Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing

Published at 2025-12-04

#ML

The study tackles the challenging problem of sphere packing, which involves arranging spheres in n-dimensional space to achieve maximum density. The researchers introduce a new method that turns this problem into a game, where a policy constructs high-precision mathematical programs to test different packing configurations. By using a smart, efficient approach that combines two powerful search techniques, they discover new upper bounds for sphere packing in dimensions 4-16, demonstrating that th...

NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

Published at 2025-12-04

#ML

The authors present a new method called Phase-Preserving Diffusion (φ-PD) that maintains spatial structure during data corruption, making it suitable for tasks requiring geometric consistency. They also introduce Frequency-Selective Structured noise for controlling structural rigidity and demonstrate the method's effectiveness in various applications, including improving CARLA-to-Waymo planner performance by 50%....

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Published at 2025-12-04

#ML

The paper presents a new method to create complex and diverse interactive environments for training large language models (LLMs) to behave as autonomous agents. The method, called Nex-N1, outperforms state-of-the-art open-source models and competes with proprietary ones in complex agentic tasks, and its source code and model weights are made available for further research....

QKAN-LSTM: Quantum-inspired Kolmogorov-Arnold Long Short-term Memory

Published at 2025-12-04

#ML

The paper presents QKAN-LSTM, an improved LSTM model that incorporates quantum-inspired activation functions to enhance predictive accuracy and reduce trainable parameters. This new architecture, which can be run on classical hardware, is tested on three datasets and shown to outperform traditional LSTMs, and is further extended to create a Hybrid QKAN-LSTM for hierarchical representation learning....

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Published at 2025-12-04

#ML

The authors present a new framework called Reward Forcing to improve the quality and efficiency of generating streaming videos. It has two key components: EMA-Sink, which captures long-term context and recent dynamics without extra cost, and Re-DMD, which prioritizes dynamic content to enhance motion quality while preserving data fidelity. The proposed method outperforms existing techniques on standard benchmarks and enables high-quality video generation at a fast speed....

SIMA 2: A Generalist Embodied Agent for Virtual Worlds

Published at 2025-12-04

#ML

SIMA 2 is a sophisticated virtual agent that can understand and perform complex tasks in various 3D environments using language and images, unlike its predecessor. It can converse, reason, and learn new skills autonomously, demonstrating significant improvement over previous models and paving the way for versatile, self-improving agents in virtual and physical worlds....

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Published at 2025-12-04

#ML

The researchers propose a new method called Semantic-First Diffusion (SFD) that prioritizes generating high-level semantic structure before fine-grained texture in image generation, improving the quality and speed of existing models....

ShadowDraw: From Any Object to Shadow-Drawing Compositional Art

Published at 2025-12-04

#ML

The authors present a system called ShadowDraw that converts regular 3D objects into shadow-based art. By predicting scene parameters and optimizing shadows, the system creates recognizable images from partial line drawings, offering a new method for generating computational visual art....

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Published at 2025-12-04

#ML

The study presents SignRoundV2, a new method to improve the performance of large language models when using extremely low-bit quantization, which is essential for efficient deployment. SignRoundV2 uses a fast sensitivity metric and a lightweight pre-tuning search to allocate bits and improve quantization, closing the gap with full-precision models and achieving competitive accuracy even at 2 bits....

Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

Published at 2025-12-04

#ML

The authors present Splannequin, a new method for creating high-quality, frozen 3D scenes from monocular videos by addressing artifacts caused by sparse temporal supervision. This technique enhances visual quality for user-selectable frozen-time renderings, with 96% user preference, and integrates seamlessly into existing dynamic Gaussian pipelines....

TV2TV: A Unified Framework for Interleaved Language and Video Generation

Published at 2025-12-04

#ML

This study presents TV2TV, a new model that improves video generation by integrating text and video generation processes, allowing for better visual quality, control, and reasoning about complex video content....

UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers

Published at 2025-12-04

#ML

The researchers developed UltraImage, a framework that improves high-resolution image generation by addressing content repetition and quality degradation. They found that repetition is caused by the periodicity of the dominant frequency in positional embeddings and proposed a correction method. Quality degradation was linked to diluted attention, which they fixed with an entropy-guided adaptive attention concentration technique. UltraImage outperforms previous methods and can generate images up ...

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages