🤗 Daily Paper(2026-01-01)

1 view

Skip to first unread message

deep.di...@gmail.com

unread,

Jan 1, 2026, 3:30:33 PMJan 1

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Published at 2025-12-17

#Diffusion Models , #Text-to-Image Generation

AI models that create pictures and videos from text often struggle to perfectly understand what you ask for, and it's usually very slow to test them. A new system provides a much faster way to check how well these AI models understand text, and also offers a better text-understanding component that improves the quality of the generated images and videos....

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Published at 2025-12-23

#Diffusion Language Models , #Post-Training Optimization

Diffusion Language Models struggle with learning after their initial setup, especially for tough tasks like math, because current methods are slow and don't align well with how they're actually used. A new system called DiRL was developed to make this post-training process much faster and more effective, leading to top-tier math performance for these models, even outperforming some established competitors....

Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Published at 2025-12-23

#3D Gaussian Splatting , #Rendering Optimization

A new rendering method, Quantile Rendering (Q-Render), efficiently handles complex visual information for understanding objects in 3D. It speeds up the process significantly by focusing only on the most important scene elements, offering much faster and more accurate results than previous techniques....

GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs

Published at 2025-12-24

#Mixture-of-Experts , #Adversarial Attacks

New, efficient AI models (MoE LLMs) are increasingly used, but their unique safety mechanisms haven't been thoroughly checked like older AI models. A method called GateBreaker can find and disable tiny 'safety switch' parts in these models, making them much more likely to produce harmful content by turning off only a small number of specific neurons....

Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks

Published at 2025-12-24

#Imitation Learning , #Knowledge Distillation

Surprisingly, a computer can learn to think better by watching how other clever computers try to solve problems, even when those others make a mistake. It seems computers learn best from examples that look like their own way of thinking, and even 'wrong' thinking often has good parts they can pick up....

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

Published at 2025-12-24

#3D Shape Generation , #Geometric Refinement

UltraShape 1.0 is like a super-smart artist that first sketches a rough 3D toy and then carefully adds all the tiny details to make it look incredibly real. It uses special steps to fix messy existing toy designs and focus on perfecting the small parts, even with limited learning examples....

An Information Theoretic Perspective on Agentic System Design

Published at 2025-12-25

#Information Theory , #Information Bottleneck

Many smart computer programs use a small AI to condense information for a bigger AI, but figuring out the best way to design that "condensing" AI used to be a guessing game. A clever way to measure how much useful information the small AI keeps now predicts how well the whole program will work, showing that making the condensing AI more powerful is far more effective than making the main AI bigger, resulting in systems that are both better and cheaper....

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

Published at 2025-12-25

#Video Diffusion Models , #Facial Animation

Creating real-time, smooth, and endlessly consistent animated faces is tough, as existing tools either look great but are slow, or are fast but glitchy. Knot Forcing is a clever new method that produces high-quality, fluid, and interactive animated portraits non-stop, even on standard computers, by smartly generating video chunks and blending them seamlessly....

Valori: A Deterministic Memory Substrate for AI Systems

Published at 2025-12-25

#Memory Architectures , #Reproducible AI

AI systems currently use a type of memory that can cause them to give different answers on various computers, making it hard to trust their results. A new memory system called Valori ensures AI memory and searches always produce the exact same outcome everywhere, which is crucial for building reliable and trustworthy AI....

Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis

Published at 2025-12-26

#Turkish Language Understanding , #Sentiment Analysis

Computers haven't had good enough tests to truly understand the Turkish language, making it hard to see how well their smart programs are doing. To fix this, a new comprehensive set of challenges called TrGLUE has been created for Turkish language understanding, alongside SentiTurca for analyzing sentiment, helping researchers build and evaluate better Turkish-speaking AI....

Monadic Context Engineering

Published at 2025-12-26

#Context Engineering , #Formal Methods for AI

Today's smart computer helpers often get mixed up or break because they're built in a jumbled way. Monadic Context Engineering is a clever new blueprint that uses special math tools to build these helpers strongly and neatly, making sure they can handle complex tasks and tricky situations without falling apart....

Self-Evaluation Unlocks Any-Step Text-to-Image Generation

Published at 2025-12-26

#Text-to-Image Generation , #Self-Evaluation

A new image-generating system called Self-E learns to create pictures from text by teaching itself, evaluating its own creations to get better. This allows it to generate high-quality images very quickly in just a few steps, and can also make even better images with more steps, all without needing a pre-trained mentor....

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Published at 2025-12-26

#Self-Verifying Agents , #Task Completion Verification

AI programs often struggle to prove they've finished complicated computer tasks because they wait until the end and sift through too much information, which is slow and unreliable. A clever new system helps these programs learn to actively collect only the most important snapshots of their progress, allowing them to quickly and effectively confirm their own success....

SpotEdit: Selective Region Editing in Diffusion Transformers

Published at 2025-12-26

#Image Inpainting , #Diffusion Transformers

Imagine you want to change just one small thing in a picture, but the computer redraws the whole image every time, making it slow and sometimes messing up parts you liked. SpotEdit is a smart new tool that only redraws the tiny part you actually changed, making edits super fast and keeping the rest of your picture looking exactly how it should be....

VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

Published at 2025-12-26

#Vision-and-Language Navigation , #Goal-Oriented Dialogue

Most robot navigation tasks use clear instructions, but real-world directions are often vague; a new approach called Interactive Instance Object Navigation (IION) teaches robots to ask questions to clarify their goals while moving. To support this, the VL-LN benchmark offers a large dataset and evaluation tools, helping improve robots that can both talk and navigate effectively....

Yume-1.5: A Text-Controlled Interactive World Generation Model

Published at 2025-12-26

#Text-to-3D Generation , #Interactive World Generation

Making interactive computer worlds you can explore has been tough because existing tools are often too big, slow, and hard to tell what to create using words. Yume-1.5 is a new system that quickly builds realistic, explorable worlds from a simple image or text prompt, letting you walk around and even control events within them just by typing....

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Published at 2025-12-27

#Diffusion Models , #Embodied AI

Current smart programs that understand pictures and words often struggle with complex visual planning and robot control because they think one step at a time. A new type of smart program, Dream-VL and Dream-VLA, was developed using a different "diffusion" thinking style, allowing them to understand pictures, words, and robot actions much more efficiently, leading to faster robot learning and top performance on challenging tasks....

DreamOmni3: Scribble-based Editing and Generation

Published at 2025-12-27

#Controllable Image Generation , #Multimodal Learning

Current image editing tools often struggle to understand precise changes when only given text commands. A new system called DreamOmni3 lets you draw simple scribbles directly on images, combined with words or other pictures, to show exactly where and how you want to edit or create something new....

Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers

Published at 2025-12-27

#Respiratory Sound Classification , #Sharpness-Aware Minimization

It's hard for computers to identify breathing problems from sounds because there isn't much clean data to learn from. A new technique helps these computers learn smarter by finding more stable ways to solve the problem, rather than just memorizing answers, significantly improving their ability to accurately detect real issues in new patients and critical conditions....

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

Published at 2025-12-27

#Causal Reasoning , #Fault Localization

It's really tough for computers to figure out exactly which code to fix based on a human's description of a problem, especially when the description doesn't pinpoint the root cause or when one problem touches many parts of the code. A system called GraphLocator helps by creating a "causal map" of the problem, discovering hidden relationships between issues and code, which then precisely identifies all the necessary fix locations much more effectively than prior methods....

Evaluating Parameter Efficient Methods for RLVR

Published at 2025-12-28

#Parameter Efficient Methods , #Reinforcement Learning

Researchers checked different ways to teach smart AI models to think better using special feedback, trying to find the most efficient method. They found that some newer teaching techniques work better than the standard one, and trying to cut too many corners or using certain math tricks can actually make the AI dumber....

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Published at 2025-12-28

#Audio-Visual Learning , #Audio-Visual Generation

JavisGPT is a clever AI that understands and generates videos where the sound and picture match up perfectly. It learns to combine what it sees and hears, allowing it to respond to instructions and create realistic-sounding videos better than other systems....

Reverse Personalization

Published at 2025-12-28

#Face Anonymization , #Controllable Face Manipulation

AI can create very realistic faces, but it's tricky to remove someone's unique identity from an image without complex steps. A new method called "reverse personalization" helps anonymously alter faces while still letting you control other features like hair or expression, even for faces the AI hasn't seen before....

SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

Published at 2025-12-28

#Surgical Robotics , #World Modeling

Surgical robots can't easily learn from existing operation videos because those videos don't show the exact movements the robot made. A special computer program called SurgWorld generates realistic practice videos and figures out the robot's actions within them, greatly improving how well robots learn to perform surgical tasks....

Video-BrowseComp: Benchmarking Agentic Video Research on Open Web

Published at 2025-12-28

#AI Agents , #Video Understanding

Smart computer programs are good at reading and looking at pictures, but they struggle to really understand information from videos on the internet, especially when they need to actively "watch" and find specific details. A new, harder challenge called Video-BrowseComp reveals that even the best programs are still very bad at truly seeing and using video evidence, often trying to guess from text instead of learning from what's actually shown....

A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers

Published at 2025-12-29

#Anomaly Detection , #Transformers

Computer logs are crucial for security, but finding problems is difficult because logs come in different forms that interact in complex ways, confusing older detection methods. A new system, CoLog, intelligently combines these various log types to uncover their relationships and reliably detect both subtle and obvious unusual activity, outperforming previous tools....

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Published at 2025-12-29

#Cognitive AI , #Brain-Inspired Memory

Brains and smart computers both need memory to learn from the past and do new things, but AI currently struggles to truly mimic how human brains remember. This work combines knowledge from brain science and AI to explain how memory works, compares how humans and AI store information, and suggests ways to make AI memory smarter and more secure in the future....

Act2Goal: From World Model To General Goal-conditioned Policy

Published at 2025-12-29

#Robotics , #Goal-conditioned Reinforcement Learning

Robots find it hard to do long, tricky tasks because they only think one step ahead, even when shown the final picture. A new system called Act2Goal helps robots by letting them imagine all the necessary steps to reach a big goal, then guides them through those steps to perform complex tasks much more reliably and learn quickly....

Bridging Your Imagination with Audio-Video Generation via a Unified Director

Published at 2025-12-29

#Audio-Video Generation , #Multimodal Generation

An AI model acts like a unified movie director, taking simple ideas and turning them into complete, multi-scene films. It combines story writing and visual planning into one process, making it easier for anyone to create videos with coherent scripts and consistent images....

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Published at 2025-12-29

#Mixture-of-Experts , #Expert Specialization

Mixture-of-Experts models sometimes struggle because the system that directs information (router) doesn't reliably match tasks with the right specialized processing unit (expert). A new helper called ERC loss ensures each expert truly excels at the type of information it's supposed to handle, making the router's choices much better and improving the overall performance of these models....

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Published at 2025-12-29

#Diffusion Models , #Transparent Object Perception

Getting computers to understand clear objects like glass is really hard because light plays tricks, making it tough to figure out their shape and distance. But by taking a video-making AI that already "knows" how to create realistic clear objects and teaching it a little more, it can now precisely map these tricky items, even helping robots grasp them better....

End-to-End Test-Time Training for Long Context

Published at 2025-12-29

#Test-Time Training , #Long Context Understanding

This new method helps a computer model understand really long stories by making it constantly learn as it reads, much like remembering new details while you read a book. It means the model can handle super long texts just as well as fancy big models, but it does so much faster, keeping its speed even when the text gets very, very long....

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

Published at 2025-12-29

#Automated Code Generation , #AI Hardware Optimization

Making AI recommendation models run quickly and efficiently is tough due to the many different models, math operations, and computer chips involved. A smart system called KernelEvolve automatically writes and fine-tunes the special instructions for these chips, making recommendation models run much faster and significantly simplifying the use of new AI hardware....

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Published at 2025-12-29

#Multimodal Learning , #Video Diffusion Models

Making AI-generated videos interactive and in real-time is tricky, especially when the AI needs to understand words, pictures, and sounds all at once. A new method drastically speeds up this process, creating high-quality videos instantly and enabling smooth, natural conversations with AI avatars, like the LiveTalk system....

Nested Browser-Use Learning for Agentic Information Seeking

Published at 2025-12-29

#Agentic AI , #Web Information Seeking

AI agents currently struggle to truly browse the internet like humans, limiting their access to rich information on complex websites. A new technique, NestBrowse, makes this easier by splitting how agents control the browser from how they explore page content, allowing them to efficiently find deep web information....

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Published at 2025-12-29

#Multimodal Learning , #Active Perception

Smart systems that try to understand both sounds and videos often miss the tiny details of how they connect. A new clever agent called OmniAgent solves this by actively listening to sounds, using them as clues to decide where to focus its attention and what special tools to use, making it much better at truly understanding sound and video together....

Pretraining Frame Preservation in Autoregressive Video Memory Compression

Published at 2025-12-29

#Video Memory Compression , #Frame Preservation

This new AI technique helps condense long videos into very small digital summaries. It makes sure that even though the video is tiny, any specific moment picked from it still looks clear and sharp, allowing other AI systems to 'remember' long video histories without costing too much....

ProGuard: Towards Proactive Multimodal Safeguard

Published at 2025-12-29

#Multimodal Learning , #Novelty Detection

AI models that create content often introduce new safety risks that current protection systems struggle to handle. A new digital guardian called ProGuard proactively identifies and describes these unexpected dangerous situations in both images and text, proving much better at spotting brand-new threats than existing tools....

Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

Published at 2025-12-29

#Robotic Manipulation , #Reward Modeling

Robots learning new skills often struggle because it's hard to precisely tell them if they're making progress, especially with complex tasks and only one viewpoint. A new system called Dopamine-Reward helps robots understand their actions better by using many camera angles and breaking down progress into clear, small steps. This approach allows robots to learn intricate manipulation tasks much faster and more reliably, leading to significant improvements in their success rate with minimal traini...

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Published at 2025-12-29

#Video Super-Resolution , #Auto-Regressive Diffusion

High-quality video enhancement tools usually make blurry videos super sharp, but they're too slow for live viewing because they try to predict the future. A new method called Stream-DiffVSR fixes this by only looking at past video, making videos look much better and smoother incredibly fast, which means live streams can now have amazing quality....

Training AI Co-Scientists Using Rubric Rewards

Published at 2025-12-29

#Scientific Discovery , #Automated Assessment

AI can now be trained to create better research plans for scientists by learning from existing papers and grading its own work using automatically extracted rules. This self-improvement process leads to plans that human experts prefer and is effective across various scientific fields, like medicine....

Web World Models

Published at 2025-12-29

#Web-based Simulations , #Generative World Modeling

Building interactive worlds for AI typically involves choosing between rigid, fixed environments or totally wild, unpredictable ones. A new approach offers a middle ground by using standard web code for the reliable rules and structure, letting AI models then imagine all the stories and details within those boundaries....

YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

Published at 2025-12-29

#Object Detection , #Mixture-of-Experts

Existing systems for finding objects in real-time use the same amount of effort for every picture, which wastes energy on easy scenes and struggles with difficult ones. A new method called YOLO-Master learns to smartly focus more processing power on complex scenes and less on simple ones, making object detection both faster and more accurate, especially in challenging situations....

Factorized Learning for Temporally Grounded Video-Language Models

Published at 2025-12-30

#Temporal Grounding , #Multimodal Learning

Video AI often struggles to pinpoint exact moments when things happen in a video, making it difficult to answer questions about those events reliably. A new learning method helps by first training the AI to precisely find these event moments, and then using that accurate timing to provide much better answers and overall video understanding....

Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Published at 2025-12-30

#Latent Concept Discovery , #AI Reasoning

It's hard to understand exactly how big AI models "think" or reason because previous methods often rely on human-defined concepts, missing many hidden processes. A new unsupervised approach automatically discovers distinct "thinking patterns" within the AI, allowing researchers to identify, control, and even uncover novel reasoning behaviors like confidence adjustments....

Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking

Published at 2025-12-30

#Visual Reasoning , #Mathematical Reasoning

Complex reasoning problems often contain hidden spatial relationships that text-only methods struggle with. A new technique, FIGR, tackles this by actively creating and using visual representations during problem-solving, which significantly improves accuracy on difficult math challenges....

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

Published at 2025-12-30

#Multi-modal Learning , #Autonomous Vehicles

Autonomous vehicles need to deeply understand their surroundings using many different sensors like cameras and LiDAR, but blending all that data for a single, smart view is challenging. This research offers a framework and roadmap for training AI models to integrate multi-sensor information effectively, aiming to build more robust spatial intelligence for real-world deployment....

GR-Dexter Technical Report

Published at 2025-12-30

#Embodied AI , #Robotic Manipulation

Making robots with complex, multi-fingered hands do various tasks just by telling them what to do is hard, especially with two arms. GR-Dexter offers a complete solution with a special robot hand, an easy way to teach it, and clever training, allowing these robots to perform many real-world jobs robustly....

Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Published at 2025-12-30

#Diffusion Models , #Internal Guidance

Picture-making computer programs (diffusion models) are powerful but sometimes struggle to create truly high-quality images, especially for less common ideas, and current helper methods often introduce new problems like distorted results. A simple new trick called Internal Guidance makes these programs much better and faster at drawing by adding extra checks during their learning process, leading to top-notch image quality and even setting new performance records....

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

Published at 2025-12-30

#Text-to-Video Generation , #Physics-Aware Generation

Making videos from text often results in scenes where objects don't behave realistically, like floating instead of falling. This project fixes that by first creating a huge collection of videos showing how things *should* move, then teaching the AI to make new videos that follow those real-world physics rules....

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Published at 2025-12-30

#Agentic AI , #Lightweight Language Models

Youtu-LLM is a new, small language model built from the ground up to be smart and plan things on its own, unlike other small models that often just copy bigger ones. It can handle complex reasoning and agent tasks really well while being efficient, proving that even tiny digital brains can have powerful thinking abilities without needing to be huge....

BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts

Published at 2025-12-31

#Dialogue Management , #Belief Tracking

To have smart conversations, robots need to guess what others are thinking, but existing methods don't use these guesses effectively to decide what to say next. A new system, BEDA, helps by turning those guesses into clear rules for picking what to say, which makes robots much better at strategic talking across different situations....

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Published at 2025-12-31

#Sparse-View 3D Reconstruction , #Diffusion Outpainting

Building 3D models from just a few photos is tricky because existing tools often miss parts, don't line up perfectly, or take too long. A new technique called GaMO helps by making the available photos much wider, allowing it to build complete 3D scenes more accurately and incredibly fast, even with very few pictures....

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Published at 2025-12-31

#Autonomous Agents , #Agent Frameworks

It's tricky to make AI agents that can do things in the real world by themselves because there hasn't been a complete system to help build them. A new ecosystem, ALE, provides the tools to create and train powerful agents like ROME, which can now perform complex tasks and shows strong results on challenging tests....

Scaling Open-Ended Reasoning to Predict the Future

Published at 2025-12-31

#Event Prediction , #Open-Ended Reasoning

Smart computer programs are taught to guess future events by learning from lots of "what if?" questions automatically made from old news articles, making sure they don't peek at real future outcomes. A specialized version of these programs, OpenForecaster 8B, became very good at predicting the future, performing as well as much larger systems, and all its tools are shared for everyone to use....

SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Published at 2025-12-31

#Generative Rendering , #4D Reconstruction and Rendering

Imagine a magic video editor that lets you completely change the camera angle *and* how things move in a video, even after it's been filmed. It learns this cool trick by smartly separating the camera's view from the action happening over time, using clever training techniques with both old and new video collections....

mHC: Manifold-Constrained Hyper-Connections

Published at 2025-12-31

#Model Connectivity , #Training Stability

Advanced ways to connect parts of AI models offer performance boosts but make training unstable and difficult to scale up efficiently. A new method, Manifold-Constrained Hyper-Connections (mHC), fixes this by ensuring these complex connections behave predictably, making large AI models easier to train and more effective....

Tags and summaries are generated by Google's Gemini 2.5 Flash.

Visit Developer's Social Media

project page 🤗 daily paper

Poster images generated by Google Gemini 3.0 Pro (Nano Banana Pro) via arxiv2poster

Reply all

Reply to author

Forward

0 new messages