🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay |
Published at 2025-08-06 |
#ML
|
The paper presents a framework called General Sample Replay (GeRe) that uses pretraining texts to efficiently prevent large language models from forgetting general capabilities while learning new tasks. This framework introduces a new optimization method that maintains activation state consistency, which helps improve performance and robustness compared to other replay strategies.... |
Read More |
|
|
![]() |
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations |
Published at 2025-08-06 |
#ML
|
The authors create a comprehensive and expandable system called NVSpeech that includes a new dataset with 48,430 spoken sentences, a model for recognizing non-verbal sounds like laughter and breathing, and a method for controlling these sounds in text-to-speech synthesis. This system allows for more natural and expressive Mandarin speech modeling, with the dataset and examples available online.... |
Read More |
|
|
|
![]() |
Improving Masked Style Transfer using Blended Partial Convolution |
Published at 2025-08-07 |
#ML
|
This research presents a new method for applying artistic style to specific regions in an image, improving upon the traditional approach of post-stylization masking. The proposed technique uses partial convolution and internal blending to accurately apply style features to the selected region, resulting in visually and quantitatively better stylization.... |
Read More |
|
|
![]() |
Optimization-Free Style Transfer for 3D Gaussian Splats |
Published at 2025-08-07 |
#ML
|
This research presents a new method for applying styles to 3D shapes using Gaussian splats without the need for reconstruction or optimization. The proposed technique involves creating a graph on the shape's surface, applying style, and then transferring it back to the splats, resulting in fast and high-quality style transfer without additional training or optimization.... |
Read More |
|
|
|
![]() |
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency |
Published at 2025-08-07 |
#ML
|
This study presents two methods, GUI-RC and GUI-RCPO, which improve the accuracy of mapping natural language instructions to screen coordinates for GUI agents. GUI-RC uses multiple predictions to identify consensus regions, while GUI-RCPO refines outputs through test-time reinforcement learning, resulting in significant performance improvements on various architectures.... |
Read More |
|
|
![]() |
UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation |
Published at 2025-08-07 |
#ML
|
The study presents a new method called UNCAGE that enhances the quality of text-to-image generation by Masked Generative Transformers, which are an alternative to traditional models. UNCAGE improves the alignment between text and image by focusing on clear object representations, leading to better results in various evaluations with minimal additional computation.... |
Read More |
|
|
|
![]() |
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent |
Published at 2025-08-07 |
#ML
|
The authors present WebWatcher, a new multi-modal research agent that can understand and reason with both visual and textual information, addressing the limitation of current text-centric deep research agents. They also introduce BrowseComp-VL, a benchmark for evaluating the capabilities of multimodal agents, and demonstrate WebWatcher's superior performance compared to other agents in complex multimodal information-seeking tasks.... |
Read More |
|
|
![]() |
WGAST: Weakly-Supervised Generative Network for Daily 10 m Land Surface Temperature Estimation via Spatio-Temporal Fusion |
Published at 2025-08-08 |
#ML
|
The study presents WGAST, a deep learning framework that uses satellite data from Terra MODIS, Landsat 8, and Sentinel-2 to estimate daily land surface temperature at 10m resolution. Compared to existing methods, WGAST significantly improves accuracy and is robust to cloud interference, with source code available for verification.... |
Read More |
|
|
|
![]() |
Adversarial Video Promotion Against Text-to-Video Retrieval |
Published at 2025-08-09 |
#ML
|
This study introduces the first attack method, ViPro, to promote videos in text-to-video retrieval systems, which can be more harmful for financial gains or spreading misinformation. The researchers also propose MoRe to improve the attack's effectiveness and test their method on various models and datasets, outperforming existing baselines by a significant margin.... |
Read More |
|
|
![]() |
Technical Report: Full-Stack Fine-Tuning for the Q Programming Language |
Published at 2025-08-09 |
#ML
|
This research presents a method to improve large language models' performance in the Q programming language, which is less common on the internet compared to other popular languages. The authors create a new evaluation dataset, train models using different techniques, and achieve better results than existing models, including GPT-4.1, providing a detailed guide for others to apply similar techniques to other tasks.... |
Read More |
|
|
|
![]() |
CharacterShot: Controllable and Consistent 4D Character Animation |
Published at 2025-08-10 |
#ML
|
The authors present a framework called CharacterShot that allows designers to create dynamic 3D characters from a single image and a 2D pose sequence. They use a powerful 2D animation model, a dual-attention module, and a novel optimization technique to generate consistent and stable 4D character representations, and introduce a large-scale dataset to improve character-centric performance.... |
Read More |
|
|
![]() |
Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy |
Published at 2025-08-10 |
#ML
|
The authors created a new method to evaluate large language models' ability to play the complex game of Diplomacy without requiring specialized training. This approach allows for easier and more accessible study of these models, providing insights into their strategic reasoning capabilities.... |
Read More |
|
|
|
![]() |
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL |
Published at 2025-08-11 |
#ML
|
The authors present ASearcher, an open-source project that uses large-scale asynchronous reinforcement learning to train search agents. They address the limitations of existing search tools by enabling long-horizon searches and improving search intelligence, resulting in significant performance gains on xBench and GAIA benchmarks.... |
Read More |
|
|
![]() |
Cut2Next: Generating Next Shot via In-Context Tuning |
Published at 2025-08-11 |
#ML
|
The authors present Cut2Next, a framework that generates high-quality, cinematically appropriate next shots by conforming to professional editing patterns and maintaining strict continuity. It employs a novel in-context tuning strategy with a Diffusion Transformer, using Relational and Individual Prompts to define overall context and per-shot content, respectively, and introduces architectural innovations for integrating diverse signals without additional parameters. The framework outperforms cu... |
Read More |
|
|
|
![]() |
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches |
Published at 2025-08-11 |
#ML
|
The authors propose a new framework called HierSearch that uses a hierarchical approach to integrate local and web searches for enterprise deep search systems. This framework improves training efficiency and mastery of complex tools compared to flat reinforcement learning, and it also includes a knowledge refiner to filter out irrelevant or hallucinated evidence, resulting in better performance than existing deep search methods.... |
Read More |
|
|
![]() |
Matrix-3D: Omnidirectional Explorable 3D World Generation |
Published at 2025-08-11 |
#ML
|
The authors present a new method called Matrix-3D that creates large, explorable 3D worlds from a single image or text prompt using panoramic video generation and reconstruction. They trained a model to generate high-quality, geometrically consistent scene videos using scene mesh renders as a condition and proposed two methods to convert these panorama videos into 3D worlds, resulting in state-of-the-art performance in panoramic video generation and 3D world generation.... |
Read More |
|
|
|
![]() |
RedDino: A foundation model for red blood cell analysis |
Published at 2025-08-11 |
#ML
|
The study presents RedDino, a specialized AI model for analyzing red blood cell images, which outperforms current models in classifying RBC shapes. RedDino's strengths lie in its ability to capture subtle morphological features, making it a valuable tool for developing reliable diagnostic instruments for blood disorders.... |
Read More |
|
|
![]() |
Aryabhata: An exam-focused language model for JEE Math |
Published at 2025-08-12 |
#ML
|
Aryabhata 1.0 is a compact, efficient math reasoning model designed for the Indian JEE exam, optimized using supervised fine-tuning, curriculum learning, and reinforcement learning. It outperforms existing models in accuracy and efficiency, offering step-by-step reasoning, and is released as an open-source foundation model for exam-centric, small language models.... |
Read More |
|
|
|
![]() |
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators |
Published at 2025-08-12 |
#ML
|
The researchers created a new method called AutoCodeGen to automatically generate high-difficulty, multilingual code generation datasets without manual annotations. They used this method to build AutoCodeBench, a large-scale benchmark for evaluating language models on challenging, diverse, and practical multilingual code generation tasks, and found that even advanced models struggle with these tasks.... |
Read More |
|
|
![]() |
BiasGym: Fantastic Biases and How to Find (and Remove) Them |
Published at 2025-08-12 |
#ML
|
BiasGym is a new framework designed to find and remove biases in large language models. It has two parts: BiasInject, which adds specific biases to the model, and BiasScope, which identifies and corrects the components causing biased behavior, without affecting the model's performance on other tasks.... |
Read More |
|
|
|
![]() |
Bridging Theory and Practice in Quantum Game Theory: Optimized Implementation of the Battle of the Sexes with Error Mitigation on NISQ Hardware |
Published at 2025-08-12 |
#ML
|
The study executes the Battle of the Sexes game on IBM Quantum's hardware using a new method to reduce errors caused by noise and hardware limitations. Results show that quantum strategies can still outperform classical ones even with these challenges, suggesting potential real-world uses for quantum game theory.... |
Read More |
|
|
![]() |
Complex Logical Instruction Generation |
Published at 2025-08-12 |
#ML
|
The authors present a new method for creating complex instructions using code functions, which they use to build a benchmark of logic-rich tasks. They find that current language models struggle with these tasks, often failing to follow more than 60% of the instructions.... |
Read More |
|
|
|
![]() |
DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition |
Published at 2025-08-12 |
#ML
|
The study proposes a method called DeCRED, which enhances the performance of speech recognition models by adding auxiliary classifiers to the decoder. This results in improved word error rates and better generalization in various testing scenarios, even outperforming other popular models like Whisper-medium, all while using less data and fewer parameters.... |
Read More |
|
|
![]() |
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments |
Published at 2025-08-12 |
#ML
|
The study presents a pipeline for creating stable training environments and a verifiable reward mechanism to improve tool use in large language models. Experiments show that this approach enhances model performance without affecting general capabilities, attributed to better context understanding and reasoning.... |
Read More |
|
|
|
![]() |
OpenCUA: Open Foundations for Computer-Use Agents |
Published at 2025-08-12 |
#ML
|
The authors present OpenCUA, an open-source framework that allows for the scaling of computer-use agent data and models. This framework includes tools for capturing human computer use, a large-scale dataset of computer tasks, and a pipeline that transforms demonstrations into actions with reasoning. OpenCUA outperforms other open-source models and surpasses OpenAI's CUA, GPT-4o, in benchmark tests.... |
Read More |
|
|
![]() |
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering |
Published at 2025-08-12 |
#ML
|
This study focuses on improving the Change Detection Visual Question Answering task by addressing domain shift, introducing a new dataset called BrightVQA for domain generalization research. They propose a novel state space model, TCSSM, which uses both bi-temporal imagery and geo-disaster-related textual information to extract domain-invariant features, outperforming state-of-the-art models in extensive experiments.... |
Read More |
|
|
|
![]() |
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models |
Published at 2025-08-12 |
#ML
|
The study finds that diffusion language models often overwrite correct answers during the denoising process. To improve accuracy, they introduce two methods: Temporal Self-Consistency Voting, which selects the most consistent prediction across steps, and Temporal Consistency Reinforcement, which encourages stable generations using Temporal Semantic Entropy as a reward signal. These methods significantly improve performance on various benchmarks.... |
Read More |
|
|
![]() |
TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation |
Published at 2025-08-12 |
#ML
|
The study presents TopXGen, a new method that uses artificial intelligence to create high-quality and diverse text data in less common languages, which can then be used to improve machine translation in those languages. The method takes advantage of AI's ability to translate well into more common languages to generate natural-sounding texts, which can then be translated back into a high-resource language for training purposes. The results show that TopXGen improves translation performance during... |
Read More |
|
|
|
![]() |
Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors |
Published at 2025-08-12 |
#ML
|
The study presents AffordDex, a new framework for robotic dexterous grasping that learns from human hand motions and object affordances. This approach results in a universal grasping policy that is both human-like in posture and functionally appropriate in contact location, outperforming current methods.... |
Read More |
|
|
![]() |
Train Long, Think Short: Curriculum Learning for Efficient Reasoning |
Published at 2025-08-12 |
#ML
|
The study presents a new method for training large language models to reason efficiently by gradually reducing their token budgets over time, encouraging them to discover effective solution strategies and then distill them into more concise reasoning traces. The proposed strategy, using Group Relative Policy Optimization (GRPO), consistently outperforms fixed-budget baselines in accuracy and token efficiency across various experiments, demonstrating the power of progressive constraint as an indu... |
Read More |
|
|
|
![]() |
VertexRegen: Mesh Generation with Continuous Level of Detail |
Published at 2025-08-12 |
#ML
|
VertexRegen is a new mesh generation framework that allows for continuous level of detail, unlike existing methods that create incomplete structures during generation. It reverses the edge collapse process through a generative model, providing valid meshes at any stage of generation with comparable quality to state-of-the-art methods.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|