🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models |
Published at 2025-12-17 |
|
#Diffusion Models
,
#Text-to-Image Generation
|
AI models that create pictures and videos from text often struggle to perfectly understand what you ask for, and it's usually very slow to test them. A new system provides a much faster way to check how well these AI models understand text, and also offers a better text-understanding component that improves the quality of the generated images and videos.... |
Read More |
|
|
|
|
DiRL: An Efficient Post-Training Framework for Diffusion Language Models |
Published at 2025-12-23 |
|
#Diffusion Language Models
,
#Post-Training Optimization
|
Diffusion Language Models struggle with learning after their initial setup, especially for tough tasks like math, because current methods are slow and don't align well with how they're actually used. A new system called DiRL was developed to make this post-training process much faster and more effective, leading to top-tier math performance for these models, even outperforming some established competitors.... |
Read More |
|
|
|
|
|
Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting |
Published at 2025-12-23 |
|
#3D Gaussian Splatting
,
#Rendering Optimization
|
A new rendering method, Quantile Rendering (Q-Render), efficiently handles complex visual information for understanding objects in 3D. It speeds up the process significantly by focusing only on the most important scene elements, offering much faster and more accurate results than previous techniques.... |
Read More |
|
|
|
|
GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs |
Published at 2025-12-24 |
|
#Mixture-of-Experts
,
#Adversarial Attacks
|
New, efficient AI models (MoE LLMs) are increasingly used, but their unique safety mechanisms haven't been thoroughly checked like older AI models. A method called GateBreaker can find and disable tiny 'safety switch' parts in these models, making them much more likely to produce harmful content by turning off only a small number of specific neurons.... |
Read More |
|
|
|
|
|
Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks |
Published at 2025-12-24 |
|
#Imitation Learning
,
#Knowledge Distillation
|
Surprisingly, a computer can learn to think better by watching how other clever computers try to solve problems, even when those others make a mistake. It seems computers learn best from examples that look like their own way of thinking, and even 'wrong' thinking often has good parts they can pick up.... |
Read More |
|
|
|
|
UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement |
Published at 2025-12-24 |
|
#3D Shape Generation
,
#Geometric Refinement
|
UltraShape 1.0 is like a super-smart artist that first sketches a rough 3D toy and then carefully adds all the tiny details to make it look incredibly real. It uses special steps to fix messy existing toy designs and focus on perfecting the small parts, even with limited learning examples.... |
Read More |
|
|
|
|
|
An Information Theoretic Perspective on Agentic System Design |
Published at 2025-12-25 |
|
#Information Theory
,
#Information Bottleneck
|
Many smart computer programs use a small AI to condense information for a bigger AI, but figuring out the best way to design that "condensing" AI used to be a guessing game. A clever way to measure how much useful information the small AI keeps now predicts how well the whole program will work, showing that making the condensing AI more powerful is far more effective than making the main AI bigger, resulting in systems that are both better and cheaper.... |
Read More |
|
|
|
|
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation |
Published at 2025-12-25 |
|
#Video Diffusion Models
,
#Facial Animation
|
Creating real-time, smooth, and endlessly consistent animated faces is tough, as existing tools either look great but are slow, or are fast but glitchy. Knot Forcing is a clever new method that produces high-quality, fluid, and interactive animated portraits non-stop, even on standard computers, by smartly generating video chunks and blending them seamlessly.... |
Read More |
|
|
|
|
|
Valori: A Deterministic Memory Substrate for AI Systems |
Published at 2025-12-25 |
|
#Memory Architectures
,
#Reproducible AI
|
AI systems currently use a type of memory that can cause them to give different answers on various computers, making it hard to trust their results. A new memory system called Valori ensures AI memory and searches always produce the exact same outcome everywhere, which is crucial for building reliable and trustworthy AI.... |
Read More |
|
|
|
|
Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis |
Published at 2025-12-26 |
|
#Turkish Language Understanding
,
#Sentiment Analysis
|
Computers haven't had good enough tests to truly understand the Turkish language, making it hard to see how well their smart programs are doing. To fix this, a new comprehensive set of challenges called TrGLUE has been created for Turkish language understanding, alongside SentiTurca for analyzing sentiment, helping researchers build and evaluate better Turkish-speaking AI.... |
Read More |
|
|
|
|
|
Monadic Context Engineering |
Published at 2025-12-26 |
|
#Context Engineering
,
#Formal Methods for AI
|
Today's smart computer helpers often get mixed up or break because they're built in a jumbled way. Monadic Context Engineering is a clever new blueprint that uses special math tools to build these helpers strongly and neatly, making sure they can handle complex tasks and tricky situations without falling apart.... |
Read More |
|
|
|
|
Self-Evaluation Unlocks Any-Step Text-to-Image Generation |
Published at 2025-12-26 |
|
#Text-to-Image Generation
,
#Self-Evaluation
|
A new image-generating system called Self-E learns to create pictures from text by teaching itself, evaluating its own creations to get better. This allows it to generate high-quality images very quickly in just a few steps, and can also make even better images with more steps, all without needing a pre-trained mentor.... |
Read More |
|
|
|
|
|
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents |
Published at 2025-12-26 |
|
#Self-Verifying Agents
,
#Task Completion Verification
|
AI programs often struggle to prove they've finished complicated computer tasks because they wait until the end and sift through too much information, which is slow and unreliable. A clever new system helps these programs learn to actively collect only the most important snapshots of their progress, allowing them to quickly and effectively confirm their own success.... |
Read More |
|
|
|
|
SpotEdit: Selective Region Editing in Diffusion Transformers |
Published at 2025-12-26 |
|
#Image Inpainting
,
#Diffusion Transformers
|
Imagine you want to change just one small thing in a picture, but the computer redraws the whole image every time, making it slow and sometimes messing up parts you liked. SpotEdit is a smart new tool that only redraws the tiny part you actually changed, making edits super fast and keeping the rest of your picture looking exactly how it should be.... |
Read More |
|
|
|
|
|
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs |
Published at 2025-12-26 |
|
#Vision-and-Language Navigation
,
#Goal-Oriented Dialogue
|
Most robot navigation tasks use clear instructions, but real-world directions are often vague; a new approach called Interactive Instance Object Navigation (IION) teaches robots to ask questions to clarify their goals while moving. To support this, the VL-LN benchmark offers a large dataset and evaluation tools, helping improve robots that can both talk and navigate effectively.... |
Read More |
|
|
|
|
Yume-1.5: A Text-Controlled Interactive World Generation Model |
Published at 2025-12-26 |
|
#Text-to-3D Generation
,
#Interactive World Generation
|
Making interactive computer worlds you can explore has been tough because existing tools are often too big, slow, and hard to tell what to create using words. Yume-1.5 is a new system that quickly builds realistic, explorable worlds from a simple image or text prompt, letting you walk around and even control events within them just by typing.... |
Read More |
|
|
|
|
|
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone |
Published at 2025-12-27 |
|
#Diffusion Models
,
#Embodied AI
|
Current smart programs that understand pictures and words often struggle with complex visual planning and robot control because they think one step at a time. A new type of smart program, Dream-VL and Dream-VLA, was developed using a different "diffusion" thinking style, allowing them to understand pictures, words, and robot actions much more efficiently, leading to faster robot learning and top performance on challenging tasks.... |
Read More |
|
|
|
|
DreamOmni3: Scribble-based Editing and Generation |
Published at 2025-12-27 |
|
#Controllable Image Generation
,
#Multimodal Learning
|
Current image editing tools often struggle to understand precise changes when only given text commands. A new system called DreamOmni3 lets you draw simple scribbles directly on images, combined with words or other pictures, to show exactly where and how you want to edit or create something new.... |
Read More |
|
|
|
|
|
Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers |
Published at 2025-12-27 |
|
#Respiratory Sound Classification
,
#Sharpness-Aware Minimization
|
It's hard for computers to identify breathing problems from sounds because there isn't much clean data to learn from. A new technique helps these computers learn smarter by finding more stable ways to solve the problem, rather than just memorizing answers, significantly improving their ability to accurately detect real issues in new patients and critical conditions.... |
Read More |
|
|
|
|
GraphLocator: Graph-guided Causal Reasoning for Issue Localization |
Published at 2025-12-27 |
|
#Causal Reasoning
,
#Fault Localization
|
It's really tough for computers to figure out exactly which code to fix based on a human's description of a problem, especially when the description doesn't pinpoint the root cause or when one problem touches many parts of the code. A system called GraphLocator helps by creating a "causal map" of the problem, discovering hidden relationships between issues and code, which then precisely identifies all the necessary fix locations much more effectively than prior methods.... |
Read More |
|
|
|
|
|
Evaluating Parameter Efficient Methods for RLVR |
Published at 2025-12-28 |
|
#Parameter Efficient Methods
,
#Reinforcement Learning
|
Researchers checked different ways to teach smart AI models to think better using special feedback, trying to find the most efficient method. They found that some newer teaching techniques work better than the standard one, and trying to cut too many corners or using certain math tricks can actually make the AI dumber.... |
Read More |
|
|
|
|
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation |
Published at 2025-12-28 |
|
#Audio-Visual Learning
,
#Audio-Visual Generation
|
JavisGPT is a clever AI that understands and generates videos where the sound and picture match up perfectly. It learns to combine what it sees and hears, allowing it to respond to instructions and create realistic-sounding videos better than other systems.... |
Read More |
|
|
|
|
|
Reverse Personalization |
Published at 2025-12-28 |
|
#Face Anonymization
,
#Controllable Face Manipulation
|
AI can create very realistic faces, but it's tricky to remove someone's unique identity from an image without complex steps. A new method called "reverse personalization" helps anonymously alter faces while still letting you control other features like hair or expression, even for faces the AI hasn't seen before.... |
Read More |
|
|
|
|
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling |
Published at 2025-12-28 |
|
#Surgical Robotics
,
#World Modeling
|
Surgical robots can't easily learn from existing operation videos because those videos don't show the exact movements the robot made. A special computer program called SurgWorld generates realistic practice videos and figures out the robot's actions within them, greatly improving how well robots learn to perform surgical tasks.... |
Read More |
|
|
|
|
|
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web |
Published at 2025-12-28 |
|
#AI Agents
,
#Video Understanding
|
Smart computer programs are good at reading and looking at pictures, but they struggle to really understand information from videos on the internet, especially when they need to actively "watch" and find specific details. A new, harder challenge called Video-BrowseComp reveals that even the best programs are still very bad at truly seeing and using video evidence, often trying to guess from text instead of learning from what's actually shown.... |
Read More |
|
|
|
|
A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers |
Published at 2025-12-29 |
|
#Anomaly Detection
,
#Transformers
|
Computer logs are crucial for security, but finding problems is difficult because logs come in different forms that interact in complex ways, confusing older detection methods. A new system, CoLog, intelligently combines these various log types to uncover their relationships and reliably detect both subtle and obvious unusual activity, outperforming previous tools.... |
Read More |
|
|
|
|
|
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents |
Published at 2025-12-29 |
|
#Cognitive AI
,
#Brain-Inspired Memory
|
Brains and smart computers both need memory to learn from the past and do new things, but AI currently struggles to truly mimic how human brains remember. This work combines knowledge from brain science and AI to explain how memory works, compares how humans and AI store information, and suggests ways to make AI memory smarter and more secure in the future.... |
Read More |
|
|
|
|
Act2Goal: From World Model To General Goal-conditioned Policy |
Published at 2025-12-29 |
|
#Robotics
,
#Goal-conditioned Reinforcement Learning
|
Robots find it hard to do long, tricky tasks because they only think one step ahead, even when shown the final picture. A new system called Act2Goal helps robots by letting them imagine all the necessary steps to reach a big goal, then guides them through those steps to perform complex tasks much more reliably and learn quickly.... |
Read More |
|
|
|
|
|
Bridging Your Imagination with Audio-Video Generation via a Unified Director |
Published at 2025-12-29 |
|
#Audio-Video Generation
,
#Multimodal Generation
|
An AI model acts like a unified movie director, taking simple ideas and turning them into complete, multi-scene films. It combines story writing and visual planning into one process, making it easier for anyone to create videos with coherent scripts and consistent images.... |
Read More |
|
|
|
|
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss |
Published at 2025-12-29 |
|
#Mixture-of-Experts
,
#Expert Specialization
|
Mixture-of-Experts models sometimes struggle because the system that directs information (router) doesn't reliably match tasks with the right specialized processing unit (expert). A new helper called ERC loss ensures each expert truly excels at the type of information it's supposed to handle, making the router's choices much better and improving the overall performance of these models.... |
Read More |
|
|
|
|
|
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation |
Published at 2025-12-29 |
|
#Diffusion Models
,
#Transparent Object Perception
|
Getting computers to understand clear objects like glass is really hard because light plays tricks, making it tough to figure out their shape and distance. But by taking a video-making AI that already "knows" how to create realistic clear objects and teaching it a little more, it can now precisely map these tricky items, even helping robots grasp them better.... |
Read More |
|
|
|
|
End-to-End Test-Time Training for Long Context |
Published at 2025-12-29 |
|
#Test-Time Training
,
#Long Context Understanding
|
This new method helps a computer model understand really long stories by making it constantly learn as it reads, much like remembering new details while you read a book. It means the model can handle super long texts just as well as fancy big models, but it does so much faster, keeping its speed even when the text gets very, very long.... |
Read More |
|
|
|
|
|
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta |
Published at 2025-12-29 |
|
#Automated Code Generation
,
#AI Hardware Optimization
|
Making AI recommendation models run quickly and efficiently is tough due to the many different models, math operations, and computer chips involved. A smart system called KernelEvolve automatically writes and fine-tunes the special instructions for these chips, making recommendation models run much faster and significantly simplifying the use of new AI hardware.... |
Read More |
|
|
|
|
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation |
Published at 2025-12-29 |
|
#Multimodal Learning
,
#Video Diffusion Models
|
Making AI-generated videos interactive and in real-time is tricky, especially when the AI needs to understand words, pictures, and sounds all at once. A new method drastically speeds up this process, creating high-quality videos instantly and enabling smooth, natural conversations with AI avatars, like the LiveTalk system.... |
Read More |
|
|
|
|
|
Nested Browser-Use Learning for Agentic Information Seeking |
Published at 2025-12-29 |
|
#Agentic AI
,
#Web Information Seeking
|
AI agents currently struggle to truly browse the internet like humans, limiting their access to rich information on complex websites. A new technique, NestBrowse, makes this easier by splitting how agents control the browser from how they explore page content, allowing them to efficiently find deep web information.... |
Read More |
|
|
|
|
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding |
Published at 2025-12-29 |
|
#Multimodal Learning
,
#Active Perception
|
Smart systems that try to understand both sounds and videos often miss the tiny details of how they connect. A new clever agent called OmniAgent solves this by actively listening to sounds, using them as clues to decide where to focus its attention and what special tools to use, making it much better at truly understanding sound and video together.... |
Read More |
|
|
|
|
|
Pretraining Frame Preservation in Autoregressive Video Memory Compression |
Published at 2025-12-29 |
|
#Video Memory Compression
,
#Frame Preservation
|
This new AI technique helps condense long videos into very small digital summaries. It makes sure that even though the video is tiny, any specific moment picked from it still looks clear and sharp, allowing other AI systems to 'remember' long video histories without costing too much.... |
Read More |
|
|
|
|
ProGuard: Towards Proactive Multimodal Safeguard |
Published at 2025-12-29 |
|
#Multimodal Learning
,
#Novelty Detection
|
AI models that create content often introduce new safety risks that current protection systems struggle to handle. A new digital guardian called ProGuard proactively identifies and describes these unexpected dangerous situations in both images and text, proving much better at spotting brand-new threats than existing tools.... |
Read More |
|
|
|
|
|
Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation |
Published at 2025-12-29 |
|
#Robotic Manipulation
,
#Reward Modeling
|
Robots learning new skills often struggle because it's hard to precisely tell them if they're making progress, especially with complex tasks and only one viewpoint. A new system called Dopamine-Reward helps robots understand their actions better by using many camera angles and breaking down progress into clear, small steps. This approach allows robots to learn intricate manipulation tasks much faster and more reliably, leading to significant improvements in their success rate with minimal traini... |
Read More |
|
|
|
|
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion |
Published at 2025-12-29 |
|
#Video Super-Resolution
,
#Auto-Regressive Diffusion
|
High-quality video enhancement tools usually make blurry videos super sharp, but they're too slow for live viewing because they try to predict the future. A new method called Stream-DiffVSR fixes this by only looking at past video, making videos look much better and smoother incredibly fast, which means live streams can now have amazing quality.... |
Read More |
|
|
|
|
|
Training AI Co-Scientists Using Rubric Rewards |
Published at 2025-12-29 |
|
#Scientific Discovery
,
#Automated Assessment
|
AI can now be trained to create better research plans for scientists by learning from existing papers and grading its own work using automatically extracted rules. This self-improvement process leads to plans that human experts prefer and is effective across various scientific fields, like medicine.... |
Read More |
|
|
|
|
Web World Models |
Published at 2025-12-29 |
|
#Web-based Simulations
,
#Generative World Modeling
|
Building interactive worlds for AI typically involves choosing between rigid, fixed environments or totally wild, unpredictable ones. A new approach offers a middle ground by using standard web code for the reliable rules and structure, letting AI models then imagine all the stories and details within those boundaries.... |
Read More |
|
|
|
|
|
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection |
Published at 2025-12-29 |
|
#Object Detection
,
#Mixture-of-Experts
|
Existing systems for finding objects in real-time use the same amount of effort for every picture, which wastes energy on easy scenes and struggles with difficult ones. A new method called YOLO-Master learns to smartly focus more processing power on complex scenes and less on simple ones, making object detection both faster and more accurate, especially in challenging situations.... |
Read More |
|
|
|
|
Factorized Learning for Temporally Grounded Video-Language Models |
Published at 2025-12-30 |
|
#Temporal Grounding
,
#Multimodal Learning
|
Video AI often struggles to pinpoint exact moments when things happen in a video, making it difficult to answer questions about those events reliably. A new learning method helps by first training the AI to precisely find these event moments, and then using that accurate timing to provide much better answers and overall video understanding.... |
Read More |
|
|
|
|
|
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process |
Published at 2025-12-30 |
|
#Latent Concept Discovery
,
#AI Reasoning
|
It's hard to understand exactly how big AI models "think" or reason because previous methods often rely on human-defined concepts, missing many hidden processes. A new unsupervised approach automatically discovers distinct "thinking patterns" within the AI, allowing researchers to identify, control, and even uncover novel reasoning behaviors like confidence adjustments.... |
Read More |
|
|
|
|
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking |
Published at 2025-12-30 |
|
#Visual Reasoning
,
#Mathematical Reasoning
|
Complex reasoning problems often contain hidden spatial relationships that text-only methods struggle with. A new technique, FIGR, tackles this by actively creating and using visual representations during problem-solving, which significantly improves accuracy on difficult math challenges.... |
Read More |
|
|
|
|
|
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems |
Published at 2025-12-30 |
|
#Multi-modal Learning
,
#Autonomous Vehicles
|
Autonomous vehicles need to deeply understand their surroundings using many different sensors like cameras and LiDAR, but blending all that data for a single, smart view is challenging. This research offers a framework and roadmap for training AI models to integrate multi-sensor information effectively, aiming to build more robust spatial intelligence for real-world deployment.... |
Read More |
|
|
|
|
GR-Dexter Technical Report |
Published at 2025-12-30 |
|
#Embodied AI
,
#Robotic Manipulation
|
Making robots with complex, multi-fingered hands do various tasks just by telling them what to do is hard, especially with two arms. GR-Dexter offers a complete solution with a special robot hand, an easy way to teach it, and clever training, allowing these robots to perform many real-world jobs robustly.... |
Read More |
|
|
|
|
|
Guiding a Diffusion Transformer with the Internal Dynamics of Itself |
Published at 2025-12-30 |
|
#Diffusion Models
,
#Internal Guidance
|
Picture-making computer programs (diffusion models) are powerful but sometimes struggle to create truly high-quality images, especially for less common ideas, and current helper methods often introduce new problems like distorted results. A simple new trick called Internal Guidance makes these programs much better and faster at drawing by adding extra checks during their learning process, leading to top-notch image quality and even setting new performance records.... |
Read More |
|
|
|
|
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation |
Published at 2025-12-30 |
|
#Text-to-Video Generation
,
#Physics-Aware Generation
|
Making videos from text often results in scenes where objects don't behave realistically, like floating instead of falling. This project fixes that by first creating a huge collection of videos showing how things *should* move, then teaching the AI to make new videos that follow those real-world physics rules.... |
Read More |
|
|
|
|
|
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models |
Published at 2025-12-30 |
|
#Agentic AI
,
#Lightweight Language Models
|
Youtu-LLM is a new, small language model built from the ground up to be smart and plan things on its own, unlike other small models that often just copy bigger ones. It can handle complex reasoning and agent tasks really well while being efficient, proving that even tiny digital brains can have powerful thinking abilities without needing to be huge.... |
Read More |
|
|
|
|
BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts |
Published at 2025-12-31 |
|
#Dialogue Management
,
#Belief Tracking
|
To have smart conversations, robots need to guess what others are thinking, but existing methods don't use these guesses effectively to decide what to say next. A new system, BEDA, helps by turning those guesses into clear rules for picking what to say, which makes robots much better at strategic talking across different situations.... |
Read More |
|
|
|
|
|
GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction |
Published at 2025-12-31 |
|
#Sparse-View 3D Reconstruction
,
#Diffusion Outpainting
|
Building 3D models from just a few photos is tricky because existing tools often miss parts, don't line up perfectly, or take too long. A new technique called GaMO helps by making the available photos much wider, allowing it to build complete 3D scenes more accurately and incredibly fast, even with very few pictures.... |
Read More |
|
|
|
|
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem |
Published at 2025-12-31 |
|
#Autonomous Agents
,
#Agent Frameworks
|
It's tricky to make AI agents that can do things in the real world by themselves because there hasn't been a complete system to help build them. A new ecosystem, ALE, provides the tools to create and train powerful agents like ROME, which can now perform complex tasks and shows strong results on challenging tests.... |
Read More |
|
|
|
|
|
Scaling Open-Ended Reasoning to Predict the Future |
Published at 2025-12-31 |
|
#Event Prediction
,
#Open-Ended Reasoning
|
Smart computer programs are taught to guess future events by learning from lots of "what if?" questions automatically made from old news articles, making sure they don't peek at real future outcomes. A specialized version of these programs, OpenForecaster 8B, became very good at predicting the future, performing as well as much larger systems, and all its tools are shared for everyone to use.... |
Read More |
|
|
|
|
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time |
Published at 2025-12-31 |
|
#Generative Rendering
,
#4D Reconstruction and Rendering
|
Imagine a magic video editor that lets you completely change the camera angle *and* how things move in a video, even after it's been filmed. It learns this cool trick by smartly separating the camera's view from the action happening over time, using clever training techniques with both old and new video collections.... |
Read More |
|
|
|
|
|
mHC: Manifold-Constrained Hyper-Connections |
Published at 2025-12-31 |
|
#Model Connectivity
,
#Training Stability
|
Advanced ways to connect parts of AI models offer performance boosts but make training unstable and difficult to scale up efficiently. A new method, Manifold-Constrained Hyper-Connections (mHC), fixes this by ensuring these complex connections behave predictably, making large AI models easier to train and more effective.... |
Read More |
|
|
|
|
|