🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries |
Published at 2025-11-01 |
|
#ML
|
The researchers developed Ariadne, a framework using synthetic mazes to test and improve the spatial reasoning abilities of Vision-Language Models (VLMs) through a difficulty-controlled curriculum. The trained VLMs demonstrated significant improvements in both synthetic and real-world spatial reasoning tasks, confirming the effectiveness of the method in extending the models' capabilities.... |
Read More |
|
|
|
![]() |
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models |
Published at 2025-11-05 |
|
#ML
|
The study presents Diffusion-SDPO, a new method for improving the alignment of text-to-image diffusion models with human preferences. Diffusion-SDPO addresses the issue of increasing reconstruction error in both preferred and less-preferred outputs by adapting the loss function to preserve the quality of the preferred output while still improving the less-preferred one.... |
Read More |
|
|
|
|
![]() |
HaluMem: Evaluating Hallucinations in Memory Systems of Agents |
Published at 2025-11-05 |
|
#ML
|
The study presents HaluMem, a new benchmark for evaluating memory hallucinations in AI systems at an operational level, which reveals that existing memory systems often generate and propagate errors during memory extraction and updating stages.... |
Read More |
|
|
|
![]() |
RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization |
Published at 2025-11-06 |
|
#ML
|
The authors present a new framework called RLoop to improve reinforcement learning for verifiable rewards, which helps models avoid overfitting and forgetting during training by using iterative policy initialization to create a loop of exploration and exploitation, leading to better generalization and performance.... |
Read More |
|
|
|
|
![]() |
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale |
Published at 2025-11-07 |
|
#ML
|
The authors present a new framework for creating large-scale, vision-centric reasoning datasets, which includes over 1 million questions with diverse skills and complexity levels. They demonstrate that using this dataset to fine-tune a visual language model significantly improves performance on various vision-centric benchmarks and even transfers well to text-only and audio reasoning tasks.... |
Read More |
|
|
|
![]() |
10 Open Challenges Steering the Future of Vision-Language-Action Models |
Published at 2025-11-08 |
|
#ML
|
This study highlights 10 key milestones in the development of vision-language-action models, which follow natural language instructions, and discusses emerging trends like spatial understanding and world dynamics modeling to promote their wider acceptance.... |
Read More |
|
|
|
|
![]() |
LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs |
Published at 2025-11-08 |
|
#ML
|
The authors present LUT-LLM, a new FPGA accelerator that makes large language model inference faster and more energy-efficient by using memory-based computations instead of arithmetic ones. This method, which involves using table lookups, allows for 1.66 times lower latency and 1.72 times higher energy efficiency compared to GPUs, and it can be scaled up to even larger models.... |
Read More |
|
|
|
![]() |
NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling |
Published at 2025-11-08 |
|
#ML
|
The authors present a new method called NURBGen that creates detailed 3D CAD models from text descriptions, using a special type of curve called NURBS. This method is better than previous ones because it's more accurate and can handle complex shapes, and it's built on a large language model that's been fine-tuned for this task. They also created a new dataset of CAD components with detailed descriptions to help train and test the model.... |
Read More |
|
|
|
|
![]() |
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads |
Published at 2025-11-08 |
|
#ML
|
The authors present a new, lightweight method for verifying the correctness of reasoning steps in large language models (LLMs) that uses uncertainty quantification heads. This approach is automatic, effective, and requires fewer parameters than existing methods, making it a promising direction for creating scalable and generalizable LLMs.... |
Read More |
|
|
|
![]() |
Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs |
Published at 2025-11-08 |
|
#ML
|
The study finds that reinforcement learning can enhance a language model's ability to navigate and search through hierarchical knowledge, improving its performance on knowledge recall tasks compared to non-RL models. The researchers suggest that this improvement comes from better procedural skills in traversing knowledge hierarchies, rather than acquiring new data, and provide evidence to support this hypothesis.... |
Read More |
|
|
|
|
![]() |
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads? |
Published at 2025-11-08 |
|
#ML
|
The researchers created a benchmark called SWE-fficiency to test how well language models can optimize real-world software repositories for performance, focusing on the 'how' rather than the 'what' to fix. They found that current models struggle with this task, often failing to match the performance improvements made by human experts.... |
Read More |
|
|
|
![]() |
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation |
Published at 2025-11-09 |
|
#ML
|
This study focuses on improving reinforcement learning with verifiable rewards for competitive programming code generation. The authors propose a two-stage training process that starts with fine-tuning a strong open-source model and then uses reinforcement learning with executable rewards. Their method, implemented on a large-scale model, achieves state-of-the-art performance and is comparable to leading systems, while also providing best practices for data curation and curriculum design.... |
Read More |
|
|
|
|
![]() |
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization |
Published at 2025-11-09 |
|
#ML
|
The study presents SofT-GRPO, a new algorithm that enhances Large Language Models (LLMs) using soft-thinking reasoning, which can outperform traditional discrete-token reasoning in some cases. SofT-GRPO improves soft-thinking LLMs, allowing them to slightly surpass discrete-token GRPO on Pass@1 and significantly outperform it on Pass@32, while maintaining the same performance on other metrics.... |
Read More |
|
|
|
![]() |
The Station: An Open-World Environment for AI-Driven Discovery |
Published at 2025-11-09 |
|
#ML
|
The STATION is a virtual world where AI agents can independently conduct scientific research, form hypotheses, and publish findings. These agents have outperformed previous AI systems in various fields, and their independent work has led to the discovery of new methods, like a density-adaptive algorithm for gene sequencing.... |
Read More |
|
|
|
|
![]() |
DIMO: Diverse 3D Motion Generation for Arbitrary Objects |
Published at 2025-11-10 |
|
#ML
|
The researchers created a method called DIMO that generates various 3D motions for any object from one image. It uses learned motion patterns from video models, embeds them into a shared space, and allows for fast and diverse motion sampling for applications like motion interpolation and language-guided motion.... |
Read More |
|
|
|
![]() |
DigiData: Training and Evaluating General-Purpose Mobile Control Agents |
Published at 2025-11-10 |
|
#ML
|
The authors present DigiData, a new, large-scale, high-quality, and diverse dataset for training mobile control agents, which is more complex and varied than existing ones. They also introduce DigiData-Bench, a benchmark for evaluating mobile control agents on real-world tasks, and propose new evaluation methods to better assess agent performance.... |
Read More |
|
|
|
|
![]() |
Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning |
Published at 2025-11-10 |
|
#ML
|
The study presents a new method called PRC-Emo for teaching emotion recognition to large language models (LLMs) in conversation, using prompt engineering, demonstration retrieval, and curriculum learning. The method improves LLMs' ability to understand emotions by incorporating explicit and implicit emotional cues, and achieves state-of-the-art performance on two popular emotion recognition datasets.... |
Read More |
|
|
|
![]() |
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction |
Published at 2025-11-10 |
|
#ML
|
The study proposes a new method called IterResearch that improves long-term problem-solving for AI agents by breaking tasks into smaller, manageable parts and using a dynamic memory system. This approach allows agents to effectively gather and use information over extended periods, outperforming existing methods and even enhancing frontier models, making it a versatile solution for complex, long-term reasoning tasks.... |
Read More |
|
|
|
|
![]() |
Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks |
Published at 2025-11-10 |
|
#ML
|
A new text embedding model, llama-embed-nemotron-8b, has been developed, which is open-source and performs very well on multiple language tasks. It uses a mix of public and synthetically generated data, and allows for user-defined instructions to improve performance for specific needs.... |
Read More |
|
|
|
![]() |
MPJudge: Towards Perceptual Assessment of Music-Induced Paintings |
Published at 2025-11-10 |
|
#ML
|
The study presents a new method for evaluating paintings created under the influence of music, which directly compares the perceptual similarity between the music and the painting. The approach uses a large-scale dataset of music and painting pairs annotated by experts and introduces a model called MPJudge that integrates music features into a visual encoder to accurately identify music-relevant regions in paintings, outperforming existing methods.... |
Read More |
|
|
|
|
![]() |
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs |
Published at 2025-11-10 |
|
#ML
|
The researchers have created MVU-Eval, a new comprehensive benchmark for testing how well multimodal large language models can understand multiple videos at once, which is important for real-world applications like sports analytics and self-driving cars. This benchmark tests the models' skills in various tasks by asking 1,824 questions about 4,959 videos, and it reveals that current models still struggle with understanding multiple videos.... |
Read More |
|
|
|
![]() |
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models |
Published at 2025-11-10 |
|
#ML
|
The authors present Omni-AVSR, a unified model that efficiently handles auditory, visual, and audio-visual speech recognition using a large language model. This model reduces training and deployment resource use compared to current methods, maintains accuracy under noise, and offers insights into the balance between performance and efficiency.... |
Read More |
|
|
|
|
![]() |
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments |
Published at 2025-11-10 |
|
#ML
|
The study presents RLVE, a method for enhancing reinforcement learning in language models using adjustable environments that provide verifiable rewards. The researchers developed RLVE-Gym, a suite of 400 such environments, and demonstrated that expanding the collection of training environments significantly improves reasoning capabilities in language models.... |
Read More |
|
|
|
![]() |
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services |
Published at 2025-11-10 |
|
#ML
|
The study presents RedOne 2.0, an SNS-focused language model that addresses the challenges of heterogeneous workloads, shifting norms, and multilingual data in social networking services. Through a three-stage training process, RedOne 2.0 outperforms a larger baseline model and demonstrates superior data efficiency and stability, making it a competitive and cost-effective solution for domain-specific language models in SNS scenarios.... |
Read More |
|
|
|
|
![]() |
Robot Learning from a Physical World Model |
Published at 2025-11-10 |
|
#ML
|
The framework PhysWorld creates lifelike videos using language and images to teach robots tasks, while also incorporating the laws of physics to ensure the robot's movements are accurate and effective. This method allows robots to learn new tasks without real-world practice and performs better than previous techniques in various real-world scenarios.... |
Read More |
|
|
|
![]() |
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs |
Published at 2025-11-10 |
|
#ML
|
The study presents a method called Routing Manifold Alignment (RoMA) that enhances the performance of large language models with Sparse Mixture-of-Experts by aligning routing weights with task embedding. RoMA introduces a regularization term that encourages similar expert choices for samples targeting similar tasks, which significantly improves the generalization of these models. The method only requires lightweight finetuning and demonstrates substantial improvement in various benchmarks and co... |
Read More |
|
|
|
|
![]() |
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence |
Published at 2025-11-10 |
|
#ML
|
The study explores how to transform existing non-recurrent language models into depth-recurrent models, which can reduce computational cost while maintaining performance. By using a curriculum of recurrences to increase the model's effective depth during training, the converted models show better performance at a given compute budget, particularly in mathematics tasks.... |
Read More |
|
|
|
![]() |
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models |
Published at 2025-11-10 |
|
#ML
|
The authors present a new framework called VADER that uses large language models to better understand anomalous events in videos by considering object interactions and context. This approach improves upon traditional methods by providing detailed, causally grounded explanations and supporting robust anomaly-related question answering, as demonstrated by strong performance on various benchmarks.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|