🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
Scaling Laws for Code: Every Programming Language Matters |
Published at 2025-12-15 |
|
#ML
|
This study investigates how different programming languages affect the performance of code large language models during pre-training, revealing that interpretated languages benefit more from scaled-up models and data compared to compiled languages. The research also finds that multilingual pre-training and a specific pre-training strategy improve cross-lingual abilities, leading to a new scaling law for optimally allocating training tokens across languages.... |
Read More |
|
|
|
![]() |
Toxicity Ahead: Forecasting Conversational Derailment on GitHub |
Published at 2025-12-16 |
|
#ML
|
Researchers collected data from GitHub discussions to understand how toxic conversations develop and created a new system using large language models to predict when a conversation might turn toxic. This system, which uses a two-step prompting process, was able to accurately identify toxic conversations in various tests, providing a more scalable approach for moderators to prevent such interactions in open-source software communities.... |
Read More |
|
|
|
|
![]() |
INTELLECT-3: Technical Report |
Published at 2025-12-17 |
|
#ML
|
The authors have created a powerful 106B-parameter model called INTELLECT-3, which excels in various fields like math and science by using advanced training methods. They have made the model and its training infrastructure available to the public, along with a new open-source tool called prime-rl, which makes it easier to train large models using reinforcement learning.... |
Read More |
|
|
|
![]() |
Reinforcement Learning for Self-Improving Agent with Skill Library |
Published at 2025-12-18 |
|
#ML
|
The authors present a new framework called SAGE that uses reinforcement learning to help agents improve themselves using a skill library. This framework, which is different from current methods, systematically incorporates skills into learning and has shown to be more efficient and accurate in experiments.... |
Read More |
|
|
|
|
![]() |
SAM Audio: Segment Anything in Audio |
Published at 2025-12-19 |
|
#ML
|
The authors present SAM Audio, a versatile model for separating sounds in any type of audio, using text, visual, or time-based prompts. It outperforms other models in various audio separation tasks, including general sounds, speech, music, and musical instruments, and introduces a new real-world separation benchmark for evaluation.... |
Read More |
|
|
|
![]() |
Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems |
Published at 2025-12-19 |
|
#ML
|
The authors present simulstream, a new open-source platform that enables evaluation and demonstration of streaming speech-to-text translation systems. This tool supports both incremental decoding and re-translation methods, making it possible to compare various systems in terms of quality and latency, and it also features an interactive web interface for showcasing systems.... |
Read More |
|
|
|
|
![]() |
MemEvolve: Meta-Evolution of Agent Memory Systems |
Published at 2025-12-21 |
|
#ML
|
The authors present MemEvolve, a framework that allows both the experiential knowledge and memory architecture of LLM-based agents to evolve together, overcoming the limitations of static memory systems. They also introduce EvolveLab, a unified codebase for self-evolving memory systems, which demonstrates significant performance improvements and generalization across various benchmarks and LLMs.... |
Read More |
|
|
|
![]() |
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies |
Published at 2025-12-22 |
|
#ML
|
The study explores the internal mechanisms of large language models by decomposing their policies into layers and modules, revealing insights about how they reason and make predictions. The researchers propose a new optimization method, Bottom-up Policy Optimization, which improves model performance by directly optimizing the internal layer policy during early training, resulting in superior performance on complex reasoning benchmarks.... |
Read More |
|
|
|
|
![]() |
Learning to Refocus with Video Diffusion Models |
Published at 2025-12-22 |
|
#ML
|
This study presents a new technique for refocusing photos after they're taken using video diffusion models, which can create a series of images with different levels of focus. This method is more accurate and reliable than previous methods, and the researchers have made a large dataset and their code available to help further research in this area.... |
Read More |
|
|
|
![]() |
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models |
Published at 2025-12-22 |
|
#ML
|
The study presents QuantiPhy, a benchmark that measures the physical reasoning ability of vision-language models (VLMs) by estimating an object's size, velocity, and acceleration from video observations. The benchmark reveals that current VLMs struggle with numerical accuracy and rely more on pre-trained knowledge than the provided visual and textual inputs when reasoning about kinematic properties.... |
Read More |
|
|
|
|
![]() |
Active Intelligence in Video Avatars via Closed-loop World Modeling |
Published at 2025-12-23 |
|
#ML
|
This study presents a new framework called ORCA that allows video avatars to interact with their environment proactively and achieve long-term goals. ORCA uses a closed-loop cycle to maintain state tracking and a dual-system architecture for strategic reasoning and action execution, enabling autonomous multi-step task completion in open-domain scenarios, as demonstrated by extensive experiments.... |
Read More |
|
|
|
![]() |
FaithLens: Detecting and Explaining Faithfulness Hallucination |
Published at 2025-12-23 |
|
#ML
|
The research presents FaithLens, a reliable and efficient model for detecting faithfulness hallucination in large language models, which provides binary predictions and explanations to enhance trustworthiness. The model is trained on carefully curated data and optimized using rule-based reinforcement learning, demonstrating superior performance compared to advanced models like GPT-4.1 and o3 on various tasks, while also generating high-quality explanations.... |
Read More |
|
|
|
|
![]() |
LongVideoAgent: Multi-Agent Reasoning with Long Videos |
Published at 2025-12-23 |
|
#ML
|
This study presents a multi-agent framework that uses a master LLM to coordinate a grounding agent and a vision agent for reasoning over long videos. The framework improves temporal grounding and fine-grained cues by focusing on relevant clips, complementing subtitles with visual details, and using reinforcement learning to encourage efficient cooperation. It outperforms non-agent baselines on the LongTVQA and LongTVQA+ datasets.... |
Read More |
|
|
|
![]() |
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents |
Published at 2025-12-23 |
|
#ML
|
The study presents a new framework called Memory-T1 that uses reinforcement learning to help conversational agents better understand and reason through long, multi-session dialogues. This framework improves the agent's ability to identify important information in the dialogue history by employing a strategy that filters and selects relevant evidence, and it has shown to significantly outperform other models in handling extensive dialogue histories.... |
Read More |
|
|
|
|
![]() |
Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation |
Published at 2025-12-23 |
|
#ML
|
The authors propose a new method to ensure the reliability of qualitative research using AI by combining two types of checks: one for agreement between different AI models and another for the similarity of their interpretations. They test this method on three popular AI models and find that all of them provide consistent results, with Gemini being the most reliable.... |
Read More |
|
|
|
![]() |
SemanticGen: Video Generation in Semantic Space |
Published at 2025-12-23 |
|
#ML
|
This study presents a new method called SemanticGen for generating videos, which starts the process in a high-level semantic space for global planning before adding details. This approach results in faster convergence and is more computationally efficient than current methods, especially for long videos, and produces high-quality results.... |
Read More |
|
|
|
|
![]() |
SpatialTree: How Spatial Abilities Branch Out in MLLMs |
Published at 2025-12-23 |
|
#ML
|
The authors propose a new framework called SpatialTree, which divides spatial abilities in multimodal language models into four levels based on cognitive science. They evaluate popular models using this framework and find that lower-level skills are independent, while higher-level skills are interconnected. The study also suggests ways to improve these abilities, including a strategy to avoid overthinking in models.... |
Read More |
|
|
|
![]() |
Step-DeepResearch Technical Report |
Published at 2025-12-23 |
|
#ML
|
This report presents Step-DeepResearch, an efficient agent for open-ended research, addressing the shortcomings of existing academic benchmarks in real-world applications. The agent is trained using a novel strategy and checklist-style judger, resulting in improved robustness and superior performance on both Scale AI Research Rubrics and ADR-Bench compared to similar models and SOTA closed-source models.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|