🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation |
Published at 2025-05-09 |
|
#ML
|
The study presents a new method called Unilogit for selectively forgetting specific information in Large Language Models while preserving their overall utility. Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, improving the model's ability to approximate the golden targets and outperforming existing methods in extensive experiments.... |
Read More |
|
|
|
![]() |
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real |
Published at 2025-05-11 |
|
#ML
|
The authors propose X-Sim, a method that uses object motion from human videos to train robot policies without requiring action labels or robot teleoperation data. X-Sim improves task progress, matches data efficiency of behavior cloning, and generalizes to new viewpoints and changes in the environment.... |
Read More |
|
|
|
|
![]() |
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering |
Published at 2025-05-12 |
|
#ML
|
MLE-Dojo is a new framework that helps train and evaluate AI agents for machine learning tasks by allowing them to learn through iterative experimentation and feedback. It uses real-world data and covers various tasks like data processing and code debugging, and while it helps improve current AI models, it also highlights their limitations in solving complex problems.... |
Read More |
|
|
|
![]() |
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning |
Published at 2025-05-13 |
|
#ML
|
The authors present OpenThinkIMG, an open-source framework for training Large Vision-Language Models (LVLMs) to use visual tools adaptively. They introduce V-ToolRL, a reinforcement learning method that enables LVLMs to discover optimal tool usage strategies, which outperforms other methods in challenging chart reasoning tasks.... |
Read More |
|
|
|
|
![]() |
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking |
Published at 2025-05-13 |
|
#ML
|
The authors present a new method called ReSurgSAM2 for accurately and efficiently segmenting and tracking objects in surgical videos over long periods, which is useful for improving surgical outcomes. This method uses an interactive experience to segment target objects, employs a two-stage framework with a cross-modal spatial-temporal Mamba for precise detection and segmentation, and incorporates a diversity-driven memory mechanism for consistent long-term tracking, achieving real-time performan... |
Read More |
|
|
|
![]() |
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning |
Published at 2025-05-14 |
|
#ML
|
This study presents a new framework that improves tokenizer flexibility in language models, addressing inefficiencies and performance limitations. It introduces Tokenadapt, a method for transplanting tokenizers, and Supertokens, which enhance compression and reduce fragmentation, leading to notable performance improvements.... |
Read More |
|
|
|
|
![]() |
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection |
Published at 2025-05-14 |
|
#ML
|
The authors propose AdaptCLIP, a method that adapts CLIP for universal visual anomaly detection without additional fine-tuning. AdaptCLIP uses three simple adapters to learn visual and textual representations alternately and incorporates both contextual and aligned residual features for comparative learning, achieving state-of-the-art performance on various anomaly detection benchmarks.... |
Read More |
|
|
|
![]() |
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models |
Published at 2025-05-14 |
|
#ML
|
The study presents a new evaluation framework, EWMBench, for embodied world models (EWMs) which generate physically plausible scenes from language commands. EWMBench assesses EWMs based on visual scene consistency, motion correctness, and semantic alignment, using a curated dataset and multi-dimensional evaluation toolkit to identify current models' limitations and guide future advancements.... |
Read More |
|
|
|
|
![]() |
EnerVerse-AC: Envisioning Embodied Environments with Action Condition |
Published at 2025-05-14 |
|
#ML
|
The authors present a new model, EVAC, which helps in testing and evaluating robots in dynamic environments without the need for physical robots or complex simulations. This model reduces costs and maintains high accuracy in robotic manipulation evaluation by generating realistic, action-conditioned video observations for policy testing.... |
Read More |
|
|
|
![]() |
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation |
Published at 2025-05-14 |
|
#ML
|
The study presents a method called AnoGen that generates realistic and diverse anomalies using a few real-world anomalies, improving anomaly detection model training. This approach enhances both anomaly classification and segmentation tasks, showing significant improvements in the AU-PR metric on the MVTec dataset.... |
Read More |
|
|
|
|
![]() |
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt |
Published at 2025-05-14 |
|
#ML
|
This study presents a new method, OneNIP, that improves the detection and segmentation of anomalies in multi-class anomaly detection. OneNIP allows reconstruction or restoration of anomalies using just one normal image prompt, outperforming previous methods on industry benchmarks like MVTec, BTAD, and VisA.... |
Read More |
|
|
|
![]() |
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning |
Published at 2025-05-14 |
|
#ML
|
The study introduces a new method for detecting anomalies in images using only visual data, without relying on language models or special datasets. This approach, called MetaUAS, can detect any type of anomaly with just one normal image prompt and performs better than existing methods for anomaly detection.... |
Read More |
|
|
|
|
![]() |
System Prompt Optimization with Meta-Learning |
Published at 2025-05-14 |
|
#ML
|
This study addresses the issue of optimizing system prompts in Large Language Models (LLMs) to improve their performance across various tasks and domains. The researchers propose a meta-learning framework that designs robust and transferable system prompts, demonstrating its effectiveness through experiments on 14 unseen datasets spanning 5 different domains.... |
Read More |
|
|
|
![]() |
3D-Fixup: Advancing Photo Editing with 3D Priors |
Published at 2025-05-15 |
|
#ML
|
This study presents a new method called 3D-Fixup for editing 2D images using 3D information, which helps with complex tasks like moving or rotating objects within the image. The method uses video data to train a model that can accurately make these edits while keeping the image realistic and high-quality.... |
Read More |
|
|
|
|
![]() |
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge |
Published at 2025-05-15 |
|
#ML
|
The study distinguishes between AI Agents and Agentic AI, providing a taxonomy, application mapping, and challenge analysis. AI Agents are modular systems for specific tasks, while Agentic AI involves multi-agent collaboration, dynamic task decomposition, and orchestrated autonomy, with unique challenges and solutions for each.... |
Read More |
|
|
|
![]() |
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models |
Published at 2025-05-15 |
|
#ML
|
The study presents a new method to improve large reasoning models by aligning them with three meta-abilities: deduction, induction, and abduction. This approach enhances performance by over 10% compared to instruction-tuned baselines and offers a scalable and reliable foundation for reasoning.... |
Read More |
|
|
|
|
![]() |
Depth Anything with Any Prior |
Published at 2025-05-15 |
|
#ML
|
This study introduces a method to create detailed and accurate depth maps for any scene by combining precise depth measurements with complete geometric structures. The proposed approach uses a two-step process: first, it fills in diverse metric priors using depth prediction and then refines the noise in depth priors using a conditioned monocular depth estimation model. The model demonstrates excellent performance across various tasks and datasets, and it can adapt to advancements in monocular de... |
Read More |
|
|
|
![]() |
End-to-End Vision Tokenizer Tuning |
Published at 2025-05-15 |
|
#ML
|
The authors propose ETT, an end-to-end approach to tune vision tokenizers for better alignment with downstream tasks, leading to improved performance in multimodal understanding and visual generation tasks without requiring significant changes to existing training pipelines.... |
Read More |
|
|
|
|
![]() |
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis |
Published at 2025-05-15 |
|
#ML
|
The study investigates the combination of large language models and diffusion transformers for generating images from text, filling gaps in previous research by providing detailed comparisons, analyzing design choices, and offering a reproducible training recipe.... |
Read More |
|
|
|
![]() |
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning |
Published at 2025-05-15 |
|
#ML
|
The authors present J1, a reinforcement learning method for training language models to improve their judgment ability. J1 converts prompts into judgment tasks with verifiable rewards, incentivizing thinking and reducing bias, and outperforms other models in various benchmarks.... |
Read More |
|
|
|
|
![]() |
Parallel Scaling Law for Language Models |
Published at 2025-05-15 |
|
#ML
|
The study presents a new method called parallel scaling (ParScale) that improves inference efficiency in language models by increasing parallel computation during training and inference, without significantly increasing parameters or memory usage. ParScale can reduce memory and latency increase compared to traditional parameter scaling, and can also enhance existing pre-trained models with minimal additional training, making powerful models more accessible in low-resource scenarios.... |
Read More |
|
|
|
![]() |
PointArena: Probing Multimodal Grounding Through Language-Guided Pointing |
Published at 2025-05-15 |
|
#ML
|
The study presents PointArena, a platform for testing multimodal models' ability to point to objects based on language instructions, which helps bridge language with visual contexts in various applications. The platform includes a dataset of pointing tasks, an interactive arena for comparing models, and a real-world robotic system for practical evaluation. The research finds that Molmo-72B performs best among the tested models, and specialized training improves pointing accuracy, highlighting it... |
Read More |
|
|
|
|
![]() |
QuXAI: Explainers for Hybrid Quantum Machine Learning Models |
Published at 2025-05-15 |
|
#ML
|
This study addresses the lack of explainability in hybrid quantum-classical machine learning models by introducing QuXAI, a framework that uses Q-MEDLEY to explain feature importance in these systems. The results demonstrate that Q-MEDLEY effectively identifies influential classical aspects and noise in HQML models, outperforming established XAI techniques in classical validation settings.... |
Read More |
|
|
|
![]() |
Style Customization of Text-to-Vector Generation with Image Diffusion Priors |
Published at 2025-05-15 |
|
#ML
|
The authors propose a two-stage pipeline for creating customizable vector graphics from text prompts. The first stage ensures the SVGs' structural integrity, while the second stage uses text-to-image models to apply custom styles, resulting in high-quality and diverse SVGs.... |
Read More |
|
|
|
|
![]() |
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think |
Published at 2025-05-15 |
|
#ML
|
The paper presents a new framework called the CoT Encyclopedia, which automatically analyzes and categorizes reasoning strategies used by large language models. The framework improves understanding of model behaviors, enables performance gains by predicting and guiding reasoning strategies, and reveals the importance of training data format in shaping model reasoning.... |
Read More |
|
|
|
![]() |
WorldPM: Scaling Human Preference Modeling |
Published at 2025-05-15 |
|
#ML
|
The study explores how preference modeling improves with larger models and datasets, similar to language modeling. They gather preference data from various online communities and train models with 1.5 billion to 72 billion parameters. The results show that deceptive feature detection and objective knowledge improve with larger models, while subjective preferences do not. The proposed World Preference Modeling technique enhances performance on various preference benchmarks and integrated RLHF pip... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|