🤗 Daily Paper(2025-12-11)

3 views
Skip to first unread message

deep.di...@gmail.com

unread,
Dec 11, 2025, 3:08:12 PM12/11/25
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

Reinventing Clinical Dialogue: Agentic Paradigms for LLM Enabled Healthcare Communication

Published at 2025-12-01

#ML

The paper proposes a new way of using Large Language Models (LLMs) in healthcare communication, moving from reactive and stateless processing to agentic autonomy. This new approach focuses on the model's ability to reason, plan, and remember, which helps balance creativity and reliability in clinical dialogue. The authors introduce a novel taxonomy to categorize methods into four archetypes, each with distinct architectural choices to ensure both autonomy and safety....

Read Moreicon

Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules

Published at 2025-12-02

#ML

The authors propose SchED, a method to speed up diffusion language models without training, which works by stopping the decoding process once a certain confidence level is reached. SchED was tested on various tasks and models, resulting in significant speedups (up to 4.0 times) while maintaining high performance....

Read Moreicon

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

Published at 2025-12-04

#ML

The study presents a new method called Edit-then-Consolidate to improve real-world knowledge editing in large language models. It addresses overfitting and insufficient knowledge integration issues, enhancing editing reliability and generalization while preserving pre-trained capabilities....

Read Moreicon

Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction

Published at 2025-12-04

#ML

The study presents MineROI-Net, a Transformer-based model for predicting the profitability of Bitcoin mining hardware within a year of purchase. Tested on data from various ASIC miners, MineROI-Net outperforms other models, offering a practical tool for reducing financial risk in mining operations by helping to determine the best time to acquire hardware....

Read Moreicon

VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

Published at 2025-12-04

#ML

The authors present a new method for generating long videos by combining a technique called autoregressive diffusion with a hybrid memory system. This system, called VideoSSM, maintains a global memory of scene dynamics and a local memory for motion cues and fine details, ensuring consistent and interactive video generation without repetitive patterns, even for minute-scale horizons....

Read Moreicon

TED-4DGS: Temporally Activated and Embedding-based Deformation for 4DGS Compression

Published at 2025-12-05

#ML

The study presents TED-4DGS, a new method that efficiently compresses dynamic 3D scenes by combining temporal control and deformation schemes. This approach outperforms existing techniques on real-world datasets by using learnable parameters, a shared deformation bank, and an implicit neural representation-based hyperprior for rate-distortion compression....

Read Moreicon

Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS

Published at 2025-12-08

#ML

This study presents a new framework for text-to-speech systems that balances the need for high-quality pronunciation with the requirement for real-time performance. By using a service-oriented architecture and lightweight strategies, the proposed system separates context-aware components from the core TTS engine, reducing latency and enabling real-time use of advanced phonemization models without sacrificing accuracy....

Read Moreicon

Pay Less Attention to Function Words for Free Robustness of Vision-Language Models

Published at 2025-12-08

#ML

The authors observe that function words can make vision-language models vulnerable to attacks and propose a method called Function-word De-Attention (FDA) to reduce this impact. FDA calculates the original and function-word cross-attention within attention heads and subtracts the latter from the former, resulting in more aligned and robust models. Experiments show that FDA significantly reduces attack success rates while maintaining or slightly improving performance on various tasks, datasets, a...

Read Moreicon

BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain

Published at 2025-12-09

#ML

The authors have developed a large-scale, automated framework called BrainExplore to discover and explain visual representations in the human cortex. This method uses unsupervised decomposition to find interpretable patterns in fMRI activity and generates natural-language descriptions for each pattern, revealing thousands of visual concepts, some of which have not been reported before....

Read Moreicon

GimbalDiffusion: Gravity-Aware Camera Control for Video Generation

Published at 2025-12-09

#ML

The authors present a new framework called GimbalDiffusion that allows for precise and interpretable control over camera motion and orientation in text-to-video generation by using gravity as a global reference and defining camera trajectories in an absolute coordinate system. They also introduce null-pitch conditioning to improve camera guidance and establish a benchmark for camera-aware video generation, thereby enhancing the controllability and robustness of text-to-video models....

Read Moreicon

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Published at 2025-12-09

#ML

The authors present InfiniteVL, a new vision-language model that combines sliding window attention and Gated DeltaNet to improve performance on information-intensive tasks and handle longer sequences compared to existing models, all while being resource-efficient and maintaining a small memory footprint....

Read Moreicon

Learning Unmasking Policies for Diffusion Language Models

Published at 2025-12-09

#ML

This study presents a new method to improve the efficiency and quality of large language models by training sampling procedures using reinforcement learning. The proposed approach, which outperforms existing heuristic strategies, can generalize to new models and longer sequences but may struggle with out-of-domain data....

Read Moreicon

OmniPSD: Layered PSD Generation with Diffusion Transformer

Published at 2025-12-09

#ML

The authors present a new method called OmniPSD that uses a diffusion model to create or separate layered PSD files with transparent backgrounds. This technique can generate PSD files from text descriptions and decompose single images into editable layers with the help of an additional tool called RGBA-VAE....

Read Moreicon

Towards a Science of Scaling Agent Systems

Published at 2025-12-09

#ML

This study explores the performance of AI agent systems, focusing on scaling principles and their impact on various tasks. The research identifies three key effects: a trade-off between tools and coordination, capability saturation, and error amplification based on coordination strategy. The findings provide guidance on selecting the best coordination strategy for specific tasks, improving performance in some cases by up to 80.9%....

Read Moreicon

WonderZoom: Multi-Scale 3D World Generation

Published at 2025-12-09

#ML

The authors propose a new method called WonderZoom for creating 3D scenes with content at various scales from one image, which is better than current models. They use specialized 3D shapes and a detailed content creator to allow users to zoom in and see more details, and their method works better than existing ones in tests....

Read Moreicon

Composing Concepts from Images and Videos via Concept-prompt Binding

Published at 2025-12-10

#ML

The study presents a new method called Bind & Compose that improves visual concept composition by combining images and videos. It uses a hierarchical structure to accurately break down complex visual concepts and a mechanism to enhance compatibility between image and video concepts, resulting in better consistency, fidelity, and motion quality than existing approaches....

Read Moreicon

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Published at 2025-12-10

#ML

The researchers present a new framework called HiF-VLA that uses motion to improve long-term decision making in robotic manipulation tasks. By incorporating past and future motion information, HiF-VLA outperforms existing methods in both benchmark tests and real-world applications with minimal additional processing time....

Read Moreicon

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

Published at 2025-12-10

#ML

The study presents IF-Bench, a new benchmark for evaluating the understanding of infrared images by multimodal large language models. The researchers assess over 40 models using this benchmark and introduce a method called GenViP to improve infrared image comprehension, which significantly boosts performance across various models....

Read Moreicon

Rethinking Chain-of-Thought Reasoning for Videos

Published at 2025-12-10

#ML

This study explores whether shorter reasoning processes and fewer visual tokens can be as effective as longer, human-like reasoning chains and large numbers of visual tokens for video understanding in large language models. The researchers develop and test a new method that allows models to use compressed visual tokens and generate brief reasoning paths, resulting in faster and more efficient models that perform competitively on various benchmarks without relying on manual or supervised training...

Read Moreicon

StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

Published at 2025-12-10

#ML

The authors present StereoWorld, a framework that uses artificial intelligence to create high-quality stereo videos from single videos, making the process more affordable and reducing artifacts. They also created a large dataset of high-definition stereo videos to train and test their method, which outperforms existing techniques in producing realistic and geometrically accurate stereo videos....

Read Moreicon

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Published at 2025-12-10

#ML

This study addresses the challenges of autonomous driving in complex scenarios by creating specialized datasets with reasoning and planning annotations. They then present UniUGP, a framework that combines scene reasoning, future video generation, and trajectory planning to improve performance using pre-trained models, resulting in superior generalization to difficult driving situations....

Read Moreicon

Published at

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages