🤗 Daily Paper(2025-08-13)

5 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 13, 2025, 4:07:32 PMAug 13

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

Published at 2025-08-06

#ML

The paper presents a framework called General Sample Replay (GeRe) that uses pretraining texts to efficiently prevent large language models from forgetting general capabilities while learning new tasks. This framework introduces a new optimization method that maintains activation state consistency, which helps improve performance and robustness compared to other replay strategies....

NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations

Published at 2025-08-06

#ML

The authors create a comprehensive and expandable system called NVSpeech that includes a new dataset with 48,430 spoken sentences, a model for recognizing non-verbal sounds like laughter and breathing, and a method for controlling these sounds in text-to-speech synthesis. This system allows for more natural and expressive Mandarin speech modeling, with the dataset and examples available online....

Improving Masked Style Transfer using Blended Partial Convolution

Published at 2025-08-07

#ML

This research presents a new method for applying artistic style to specific regions in an image, improving upon the traditional approach of post-stylization masking. The proposed technique uses partial convolution and internal blending to accurately apply style features to the selected region, resulting in visually and quantitatively better stylization....

Optimization-Free Style Transfer for 3D Gaussian Splats

Published at 2025-08-07

#ML

This research presents a new method for applying styles to 3D shapes using Gaussian splats without the need for reconstruction or optimization. The proposed technique involves creating a graph on the shape's surface, applying style, and then transferring it back to the splats, resulting in fast and high-quality style transfer without additional training or optimization....

Test-Time Reinforcement Learning for GUI Grounding via Region Consistency

Published at 2025-08-07

#ML

This study presents two methods, GUI-RC and GUI-RCPO, which improve the accuracy of mapping natural language instructions to screen coordinates for GUI agents. GUI-RC uses multiple predictions to identify consensus regions, while GUI-RCPO refines outputs through test-time reinforcement learning, resulting in significant performance improvements on various architectures....

UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation

Published at 2025-08-07

#ML

The study presents a new method called UNCAGE that enhances the quality of text-to-image generation by Masked Generative Transformers, which are an alternative to traditional models. UNCAGE improves the alignment between text and image by focusing on clear object representations, leading to better results in various evaluations with minimal additional computation....

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Published at 2025-08-07

#ML

The authors present WebWatcher, a new multi-modal research agent that can understand and reason with both visual and textual information, addressing the limitation of current text-centric deep research agents. They also introduce BrowseComp-VL, a benchmark for evaluating the capabilities of multimodal agents, and demonstrate WebWatcher's superior performance compared to other agents in complex multimodal information-seeking tasks....

WGAST: Weakly-Supervised Generative Network for Daily 10 m Land Surface Temperature Estimation via Spatio-Temporal Fusion

Published at 2025-08-08

#ML

The study presents WGAST, a deep learning framework that uses satellite data from Terra MODIS, Landsat 8, and Sentinel-2 to estimate daily land surface temperature at 10m resolution. Compared to existing methods, WGAST significantly improves accuracy and is robust to cloud interference, with source code available for verification....

Adversarial Video Promotion Against Text-to-Video Retrieval

Published at 2025-08-09

#ML

This study introduces the first attack method, ViPro, to promote videos in text-to-video retrieval systems, which can be more harmful for financial gains or spreading misinformation. The researchers also propose MoRe to improve the attack's effectiveness and test their method on various models and datasets, outperforming existing baselines by a significant margin....

Technical Report: Full-Stack Fine-Tuning for the Q Programming Language

Published at 2025-08-09

#ML

This research presents a method to improve large language models' performance in the Q programming language, which is less common on the internet compared to other popular languages. The authors create a new evaluation dataset, train models using different techniques, and achieve better results than existing models, including GPT-4.1, providing a detailed guide for others to apply similar techniques to other tasks....

CharacterShot: Controllable and Consistent 4D Character Animation

Published at 2025-08-10

#ML

The authors present a framework called CharacterShot that allows designers to create dynamic 3D characters from a single image and a 2D pose sequence. They use a powerful 2D animation model, a dual-attention module, and a novel optimization technique to generate consistent and stable 4D character representations, and introduce a large-scale dataset to improve character-centric performance....

Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy

Published at 2025-08-10

#ML

The authors created a new method to evaluate large language models' ability to play the complex game of Diplomacy without requiring specialized training. This approach allows for easier and more accessible study of these models, providing insights into their strategic reasoning capabilities....

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Published at 2025-08-11

#ML

The authors present ASearcher, an open-source project that uses large-scale asynchronous reinforcement learning to train search agents. They address the limitations of existing search tools by enabling long-horizon searches and improving search intelligence, resulting in significant performance gains on xBench and GAIA benchmarks....

Cut2Next: Generating Next Shot via In-Context Tuning

Published at 2025-08-11

#ML

The authors present Cut2Next, a framework that generates high-quality, cinematically appropriate next shots by conforming to professional editing patterns and maintaining strict continuity. It employs a novel in-context tuning strategy with a Diffusion Transformer, using Relational and Individual Prompts to define overall context and per-shot content, respectively, and introduces architectural innovations for integrating diverse signals without additional parameters. The framework outperforms cu...

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

Published at 2025-08-11

#ML

The authors propose a new framework called HierSearch that uses a hierarchical approach to integrate local and web searches for enterprise deep search systems. This framework improves training efficiency and mastery of complex tools compared to flat reinforcement learning, and it also includes a knowledge refiner to filter out irrelevant or hallucinated evidence, resulting in better performance than existing deep search methods....

Matrix-3D: Omnidirectional Explorable 3D World Generation

Published at 2025-08-11

#ML

The authors present a new method called Matrix-3D that creates large, explorable 3D worlds from a single image or text prompt using panoramic video generation and reconstruction. They trained a model to generate high-quality, geometrically consistent scene videos using scene mesh renders as a condition and proposed two methods to convert these panorama videos into 3D worlds, resulting in state-of-the-art performance in panoramic video generation and 3D world generation....

RedDino: A foundation model for red blood cell analysis

Published at 2025-08-11

#ML

The study presents RedDino, a specialized AI model for analyzing red blood cell images, which outperforms current models in classifying RBC shapes. RedDino's strengths lie in its ability to capture subtle morphological features, making it a valuable tool for developing reliable diagnostic instruments for blood disorders....

Aryabhata: An exam-focused language model for JEE Math

Published at 2025-08-12

#ML

Aryabhata 1.0 is a compact, efficient math reasoning model designed for the Indian JEE exam, optimized using supervised fine-tuning, curriculum learning, and reinforcement learning. It outperforms existing models in accuracy and efficiency, offering step-by-step reasoning, and is released as an open-source foundation model for exam-centric, small language models....

AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators

Published at 2025-08-12

#ML

The researchers created a new method called AutoCodeGen to automatically generate high-difficulty, multilingual code generation datasets without manual annotations. They used this method to build AutoCodeBench, a large-scale benchmark for evaluating language models on challenging, diverse, and practical multilingual code generation tasks, and found that even advanced models struggle with these tasks....

BiasGym: Fantastic Biases and How to Find (and Remove) Them

Published at 2025-08-12

#ML

BiasGym is a new framework designed to find and remove biases in large language models. It has two parts: BiasInject, which adds specific biases to the model, and BiasScope, which identifies and corrects the components causing biased behavior, without affecting the model's performance on other tasks....

Bridging Theory and Practice in Quantum Game Theory: Optimized Implementation of the Battle of the Sexes with Error Mitigation on NISQ Hardware

Published at 2025-08-12

#ML

The study executes the Battle of the Sexes game on IBM Quantum's hardware using a new method to reduce errors caused by noise and hardware limitations. Results show that quantum strategies can still outperform classical ones even with these challenges, suggesting potential real-world uses for quantum game theory....

Complex Logical Instruction Generation

Published at 2025-08-12

#ML

The authors present a new method for creating complex instructions using code functions, which they use to build a benchmark of logic-rich tasks. They find that current language models struggle with these tasks, often failing to follow more than 60% of the instructions....

DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition

Published at 2025-08-12

#ML

The study proposes a method called DeCRED, which enhances the performance of speech recognition models by adding auxiliary classifiers to the decoder. This results in improved word error rates and better generalization in various testing scenarios, even outperforming other popular models like Whisper-medium, all while using less data and fewer parameters....

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

Published at 2025-08-12

#ML

The study presents a pipeline for creating stable training environments and a verifiable reward mechanism to improve tool use in large language models. Experiments show that this approach enhances model performance without affecting general capabilities, attributed to better context understanding and reasoning....

OpenCUA: Open Foundations for Computer-Use Agents

Published at 2025-08-12

#ML

The authors present OpenCUA, an open-source framework that allows for the scaling of computer-use agent data and models. This framework includes tools for capturing human computer use, a large-scale dataset of computer tasks, and a pipeline that transforms demonstrations into actions with reasoning. OpenCUA outperforms other open-source models and surpasses OpenAI's CUA, GPT-4o, in benchmark tests....

Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering

Published at 2025-08-12

#ML

This study focuses on improving the Change Detection Visual Question Answering task by addressing domain shift, introducing a new dataset called BrightVQA for domain generalization research. They propose a novel state space model, TCSSM, which uses both bi-temporal imagery and geo-disaster-related textual information to extract domain-invariant features, outperforming state-of-the-art models in extensive experiments....

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

Published at 2025-08-12

#ML

The study finds that diffusion language models often overwrite correct answers during the denoising process. To improve accuracy, they introduce two methods: Temporal Self-Consistency Voting, which selects the most consistent prediction across steps, and Temporal Consistency Reinforcement, which encourages stable generations using Temporal Semantic Entropy as a reward signal. These methods significantly improve performance on various benchmarks....

TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation

Published at 2025-08-12

#ML

The study presents TopXGen, a new method that uses artificial intelligence to create high-quality and diverse text data in less common languages, which can then be used to improve machine translation in those languages. The method takes advantage of AI's ability to translate well into more common languages to generate natural-sounding texts, which can then be translated back into a high-resource language for training purposes. The results show that TopXGen improves translation performance during...

Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors

Published at 2025-08-12

#ML

The study presents AffordDex, a new framework for robotic dexterous grasping that learns from human hand motions and object affordances. This approach results in a universal grasping policy that is both human-like in posture and functionally appropriate in contact location, outperforming current methods....

Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Published at 2025-08-12

#ML

The study presents a new method for training large language models to reason efficiently by gradually reducing their token budgets over time, encouraging them to discover effective solution strategies and then distill them into more concise reasoning traces. The proposed strategy, using Group Relative Policy Optimization (GRPO), consistently outperforms fixed-budget baselines in accuracy and token efficiency across various experiments, demonstrating the power of progressive constraint as an indu...

VertexRegen: Mesh Generation with Continuous Level of Detail

Published at 2025-08-12

#ML

VertexRegen is a new mesh generation framework that allows for continuous level of detail, unlike existing methods that create incomplete structures during generation. It reverses the edge collapse process through a generative model, providing valid meshes at any stage of generation with comparable quality to state-of-the-art methods....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages