🤗 Daily Paper(2025-11-11)

4 views

Skip to first unread message

deep.di...@gmail.com

unread,

Nov 11, 2025, 3:07:53 PM (7 days ago) Nov 11

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

Published at 2025-11-01

#ML

The researchers developed Ariadne, a framework using synthetic mazes to test and improve the spatial reasoning abilities of Vision-Language Models (VLMs) through a difficulty-controlled curriculum. The trained VLMs demonstrated significant improvements in both synthetic and real-world spatial reasoning tasks, confirming the effectiveness of the method in extending the models' capabilities....

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

Published at 2025-11-05

#ML

The study presents Diffusion-SDPO, a new method for improving the alignment of text-to-image diffusion models with human preferences. Diffusion-SDPO addresses the issue of increasing reconstruction error in both preferred and less-preferred outputs by adapting the loss function to preserve the quality of the preferred output while still improving the less-preferred one....

HaluMem: Evaluating Hallucinations in Memory Systems of Agents

Published at 2025-11-05

#ML

The study presents HaluMem, a new benchmark for evaluating memory hallucinations in AI systems at an operational level, which reveals that existing memory systems often generate and propagate errors during memory extraction and updating stages....

RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

Published at 2025-11-06

#ML

The authors present a new framework called RLoop to improve reinforcement learning for verifiable rewards, which helps models avoid overfitting and forgetting during training by using iterative policy initialization to create a loop of exploration and exploitation, leading to better generalization and performance....

Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

Published at 2025-11-07

#ML

The authors present a new framework for creating large-scale, vision-centric reasoning datasets, which includes over 1 million questions with diverse skills and complexity levels. They demonstrate that using this dataset to fine-tune a visual language model significantly improves performance on various vision-centric benchmarks and even transfers well to text-only and audio reasoning tasks....

10 Open Challenges Steering the Future of Vision-Language-Action Models

Published at 2025-11-08

#ML

This study highlights 10 key milestones in the development of vision-language-action models, which follow natural language instructions, and discusses emerging trends like spatial understanding and world dynamics modeling to promote their wider acceptance....

LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

Published at 2025-11-08

#ML

The authors present LUT-LLM, a new FPGA accelerator that makes large language model inference faster and more energy-efficient by using memory-based computations instead of arithmetic ones. This method, which involves using table lookups, allows for 1.66 times lower latency and 1.72 times higher energy efficiency compared to GPUs, and it can be scaled up to even larger models....

NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling

Published at 2025-11-08

#ML

The authors present a new method called NURBGen that creates detailed 3D CAD models from text descriptions, using a special type of curve called NURBS. This method is better than previous ones because it's more accurate and can handle complex shapes, and it's built on a large language model that's been fine-tuned for this task. They also created a new dataset of CAD components with detailed descriptions to help train and test the model....

Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads

Published at 2025-11-08

#ML

The authors present a new, lightweight method for verifying the correctness of reasoning steps in large language models (LLMs) that uses uncertainty quantification heads. This approach is automatic, effective, and requires fewer parameters than existing methods, making it a promising direction for creating scalable and generalizable LLMs....

Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs

Published at 2025-11-08

#ML

The study finds that reinforcement learning can enhance a language model's ability to navigate and search through hierarchical knowledge, improving its performance on knowledge recall tasks compared to non-RL models. The researchers suggest that this improvement comes from better procedural skills in traversing knowledge hierarchies, rather than acquiring new data, and provide evidence to support this hypothesis....

SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?

Published at 2025-11-08

#ML

The researchers created a benchmark called SWE-fficiency to test how well language models can optimize real-world software repositories for performance, focusing on the 'how' rather than the 'what' to fix. They found that current models struggle with this task, often failing to match the performance improvements made by human experts....

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Published at 2025-11-09

#ML

This study focuses on improving reinforcement learning with verifiable rewards for competitive programming code generation. The authors propose a two-stage training process that starts with fine-tuning a strong open-source model and then uses reinforcement learning with executable rewards. Their method, implemented on a large-scale model, achieves state-of-the-art performance and is comparable to leading systems, while also providing best practices for data curation and curriculum design....

SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization

Published at 2025-11-09

#ML

The study presents SofT-GRPO, a new algorithm that enhances Large Language Models (LLMs) using soft-thinking reasoning, which can outperform traditional discrete-token reasoning in some cases. SofT-GRPO improves soft-thinking LLMs, allowing them to slightly surpass discrete-token GRPO on Pass@1 and significantly outperform it on Pass@32, while maintaining the same performance on other metrics....

The Station: An Open-World Environment for AI-Driven Discovery

Published at 2025-11-09

#ML

The STATION is a virtual world where AI agents can independently conduct scientific research, form hypotheses, and publish findings. These agents have outperformed previous AI systems in various fields, and their independent work has led to the discovery of new methods, like a density-adaptive algorithm for gene sequencing....

DIMO: Diverse 3D Motion Generation for Arbitrary Objects

Published at 2025-11-10

#ML

The researchers created a method called DIMO that generates various 3D motions for any object from one image. It uses learned motion patterns from video models, embeds them into a shared space, and allows for fast and diverse motion sampling for applications like motion interpolation and language-guided motion....

DigiData: Training and Evaluating General-Purpose Mobile Control Agents

Published at 2025-11-10

#ML

The authors present DigiData, a new, large-scale, high-quality, and diverse dataset for training mobile control agents, which is more complex and varied than existing ones. They also introduce DigiData-Bench, a benchmark for evaluating mobile control agents on real-world tasks, and propose new evaluation methods to better assess agent performance....

Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning

Published at 2025-11-10

#ML

The study presents a new method called PRC-Emo for teaching emotion recognition to large language models (LLMs) in conversation, using prompt engineering, demonstration retrieval, and curriculum learning. The method improves LLMs' ability to understand emotions by incorporating explicit and implicit emotional cues, and achieves state-of-the-art performance on two popular emotion recognition datasets....

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Published at 2025-11-10

#ML

The study proposes a new method called IterResearch that improves long-term problem-solving for AI agents by breaking tasks into smaller, manageable parts and using a dynamic memory system. This approach allows agents to effectively gather and use information over extended periods, outperforming existing methods and even enhancing frontier models, making it a versatile solution for complex, long-term reasoning tasks....

Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

Published at 2025-11-10

#ML

A new text embedding model, llama-embed-nemotron-8b, has been developed, which is open-source and performs very well on multiple language tasks. It uses a mix of public and synthetically generated data, and allows for user-defined instructions to improve performance for specific needs....

MPJudge: Towards Perceptual Assessment of Music-Induced Paintings

Published at 2025-11-10

#ML

The study presents a new method for evaluating paintings created under the influence of music, which directly compares the perceptual similarity between the music and the painting. The approach uses a large-scale dataset of music and painting pairs annotated by experts and introduces a model called MPJudge that integrates music features into a visual encoder to accurately identify music-relevant regions in paintings, outperforming existing methods....

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Published at 2025-11-10

#ML

The researchers have created MVU-Eval, a new comprehensive benchmark for testing how well multimodal large language models can understand multiple videos at once, which is important for real-world applications like sports analytics and self-driving cars. This benchmark tests the models' skills in various tasks by asking 1,824 questions about 4,959 videos, and it reveals that current models still struggle with understanding multiple videos....

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

Published at 2025-11-10

#ML

The authors present Omni-AVSR, a unified model that efficiently handles auditory, visual, and audio-visual speech recognition using a large language model. This model reduces training and deployment resource use compared to current methods, maintains accuracy under noise, and offers insights into the balance between performance and efficiency....

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Published at 2025-11-10

#ML

The study presents RLVE, a method for enhancing reinforcement learning in language models using adjustable environments that provide verifiable rewards. The researchers developed RLVE-Gym, a suite of 400 such environments, and demonstrated that expanding the collection of training environments significantly improves reasoning capabilities in language models....

RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

Published at 2025-11-10

#ML

The study presents RedOne 2.0, an SNS-focused language model that addresses the challenges of heterogeneous workloads, shifting norms, and multilingual data in social networking services. Through a three-stage training process, RedOne 2.0 outperforms a larger baseline model and demonstrates superior data efficiency and stability, making it a competitive and cost-effective solution for domain-specific language models in SNS scenarios....

Robot Learning from a Physical World Model

Published at 2025-11-10

#ML

The framework PhysWorld creates lifelike videos using language and images to teach robots tasks, while also incorporating the laws of physics to ensure the robot's movements are accurate and effective. This method allows robots to learn new tasks without real-world practice and performs better than previous techniques in various real-world scenarios....

Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs

Published at 2025-11-10

#ML

The study presents a method called Routing Manifold Alignment (RoMA) that enhances the performance of large language models with Sparse Mixture-of-Experts by aligning routing weights with task embedding. RoMA introduces a regularization term that encourages similar expert choices for samples targeting similar tasks, which significantly improves the generalization of these models. The method only requires lightweight finetuning and demonstrates substantial improvement in various benchmarks and co...

Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Published at 2025-11-10

#ML

The study explores how to transform existing non-recurrent language models into depth-recurrent models, which can reduce computational cost while maintaining performance. By using a curriculum of recurrences to increase the model's effective depth during training, the converted models show better performance at a given compute budget, particularly in mathematics tasks....

VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models

Published at 2025-11-10

#ML

The authors present a new framework called VADER that uses large language models to better understand anomalous events in videos by considering object interactions and context. This approach improves upon traditional methods by providing detailed, causally grounded explanations and supporting robust anomaly-related question answering, as demonstrated by strong performance on various benchmarks....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages