🤗 Daily Paper(2025-05-16)

6 views

Skip to first unread message

deep.di...@gmail.com

unread,

May 16, 2025, 4:07:34 PM5/16/25

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Published at 2025-05-09

#ML

The study presents a new method called Unilogit for selectively forgetting specific information in Large Language Models while preserving their overall utility. Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, improving the model's ability to approximate the golden targets and outperforming existing methods in extensive experiments....

X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

Published at 2025-05-11

#ML

The authors propose X-Sim, a method that uses object motion from human videos to train robot policies without requiring action labels or robot teleoperation data. X-Sim improves task progress, matches data efficiency of behavior cloning, and generalizes to new viewpoints and changes in the environment....

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

Published at 2025-05-12

#ML

MLE-Dojo is a new framework that helps train and evaluate AI agents for machine learning tasks by allowing them to learn through iterative experimentation and feedback. It uses real-world data and covers various tasks like data processing and code debugging, and while it helps improve current AI models, it also highlights their limitations in solving complex problems....

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

Published at 2025-05-13

#ML

The authors present OpenThinkIMG, an open-source framework for training Large Vision-Language Models (LVLMs) to use visual tools adaptively. They introduce V-ToolRL, a reinforcement learning method that enables LVLMs to discover optimal tool usage strategies, which outperforms other methods in challenging chart reasoning tasks....

ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

Published at 2025-05-13

#ML

The authors present a new method called ReSurgSAM2 for accurately and efficiently segmenting and tracking objects in surgical videos over long periods, which is useful for improving surgical outcomes. This method uses an interactive experience to segment target objects, employs a two-stage framework with a cross-modal spatial-temporal Mamba for precise detection and segmentation, and incorporates a diversity-driven memory mechanism for consistent long-term tracking, achieving real-time performan...

Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning

Published at 2025-05-14

#ML

This study presents a new framework that improves tokenizer flexibility in language models, addressing inefficiencies and performance limitations. It introduces Tokenadapt, a method for transplanting tokenizers, and Supertokens, which enhance compression and reduce fragmentation, leading to notable performance improvements....

AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection

Published at 2025-05-14

#ML

The authors propose AdaptCLIP, a method that adapts CLIP for universal visual anomaly detection without additional fine-tuning. AdaptCLIP uses three simple adapters to learn visual and textual representations alternately and incorporates both contextual and aligned residual features for comparative learning, achieving state-of-the-art performance on various anomaly detection benchmarks....

EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models

Published at 2025-05-14

#ML

The study presents a new evaluation framework, EWMBench, for embodied world models (EWMs) which generate physically plausible scenes from language commands. EWMBench assesses EWMs based on visual scene consistency, motion correctness, and semantic alignment, using a curated dataset and multi-dimensional evaluation toolkit to identify current models' limitations and guide future advancements....

EnerVerse-AC: Envisioning Embodied Environments with Action Condition

Published at 2025-05-14

#ML

The authors present a new model, EVAC, which helps in testing and evaluating robots in dynamic environments without the need for physical robots or complex simulations. This model reduces costs and maintains high accuracy in robotic manipulation evaluation by generating realistic, action-conditioned video observations for policy testing....

Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

Published at 2025-05-14

#ML

The study presents a method called AnoGen that generates realistic and diverse anomalies using a few real-world anomalies, improving anomaly detection model training. This approach enhances both anomaly classification and segmentation tasks, showing significant improvements in the AU-PR metric on the MVTec dataset....

Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt

Published at 2025-05-14

#ML

This study presents a new method, OneNIP, that improves the detection and segmentation of anomalies in multi-class anomaly detection. OneNIP allows reconstruction or restoration of anomalies using just one normal image prompt, outperforming previous methods on industry benchmarks like MVTec, BTAD, and VisA....

MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

Published at 2025-05-14

#ML

The study introduces a new method for detecting anomalies in images using only visual data, without relying on language models or special datasets. This approach, called MetaUAS, can detect any type of anomaly with just one normal image prompt and performs better than existing methods for anomaly detection....

System Prompt Optimization with Meta-Learning

Published at 2025-05-14

#ML

This study addresses the issue of optimizing system prompts in Large Language Models (LLMs) to improve their performance across various tasks and domains. The researchers propose a meta-learning framework that designs robust and transferable system prompts, demonstrating its effectiveness through experiments on 14 unseen datasets spanning 5 different domains....

3D-Fixup: Advancing Photo Editing with 3D Priors

Published at 2025-05-15

#ML

This study presents a new method called 3D-Fixup for editing 2D images using 3D information, which helps with complex tasks like moving or rotating objects within the image. The method uses video data to train a model that can accurately make these edits while keeping the image realistic and high-quality....

AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge

Published at 2025-05-15

#ML

The study distinguishes between AI Agents and Agentic AI, providing a taxonomy, application mapping, and challenge analysis. AI Agents are modular systems for specific tasks, while Agentic AI involves multi-agent collaboration, dynamic task decomposition, and orchestrated autonomy, with unique challenges and solutions for each....

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Published at 2025-05-15

#ML

The study presents a new method to improve large reasoning models by aligning them with three meta-abilities: deduction, induction, and abduction. This approach enhances performance by over 10% compared to instruction-tuned baselines and offers a scalable and reliable foundation for reasoning....

Depth Anything with Any Prior

Published at 2025-05-15

#ML

This study introduces a method to create detailed and accurate depth maps for any scene by combining precise depth measurements with complete geometric structures. The proposed approach uses a two-step process: first, it fills in diverse metric priors using depth prediction and then refines the noise in depth priors using a conditioned monocular depth estimation model. The model demonstrates excellent performance across various tasks and datasets, and it can adapt to advancements in monocular de...

End-to-End Vision Tokenizer Tuning

Published at 2025-05-15

#ML

The authors propose ETT, an end-to-end approach to tune vision tokenizers for better alignment with downstream tasks, leading to improved performance in multimodal understanding and visual generation tasks without requiring significant changes to existing training pipelines....

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Published at 2025-05-15

#ML

The study investigates the combination of large language models and diffusion transformers for generating images from text, filling gaps in previous research by providing detailed comparisons, analyzing design choices, and offering a reproducible training recipe....

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Published at 2025-05-15

#ML

The authors present J1, a reinforcement learning method for training language models to improve their judgment ability. J1 converts prompts into judgment tasks with verifiable rewards, incentivizing thinking and reducing bias, and outperforms other models in various benchmarks....

Parallel Scaling Law for Language Models

Published at 2025-05-15

#ML

The study presents a new method called parallel scaling (ParScale) that improves inference efficiency in language models by increasing parallel computation during training and inference, without significantly increasing parameters or memory usage. ParScale can reduce memory and latency increase compared to traditional parameter scaling, and can also enhance existing pre-trained models with minimal additional training, making powerful models more accessible in low-resource scenarios....

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

Published at 2025-05-15

#ML

The study presents PointArena, a platform for testing multimodal models' ability to point to objects based on language instructions, which helps bridge language with visual contexts in various applications. The platform includes a dataset of pointing tasks, an interactive arena for comparing models, and a real-world robotic system for practical evaluation. The research finds that Molmo-72B performs best among the tested models, and specialized training improves pointing accuracy, highlighting it...

QuXAI: Explainers for Hybrid Quantum Machine Learning Models

Published at 2025-05-15

#ML

This study addresses the lack of explainability in hybrid quantum-classical machine learning models by introducing QuXAI, a framework that uses Q-MEDLEY to explain feature importance in these systems. The results demonstrate that Q-MEDLEY effectively identifies influential classical aspects and noise in HQML models, outperforming established XAI techniques in classical validation settings....

Style Customization of Text-to-Vector Generation with Image Diffusion Priors

Published at 2025-05-15

#ML

The authors propose a two-stage pipeline for creating customizable vector graphics from text prompts. The first stage ensures the SVGs' structural integrity, while the second stage uses text-to-image models to apply custom styles, resulting in high-quality and diverse SVGs....

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Published at 2025-05-15

#ML

The paper presents a new framework called the CoT Encyclopedia, which automatically analyzes and categorizes reasoning strategies used by large language models. The framework improves understanding of model behaviors, enables performance gains by predicting and guiding reasoning strategies, and reveals the importance of training data format in shaping model reasoning....

WorldPM: Scaling Human Preference Modeling

Published at 2025-05-15

#ML

The study explores how preference modeling improves with larger models and datasets, similar to language modeling. They gather preference data from various online communities and train models with 1.5 billion to 72 billion parameters. The results show that deceptive feature detection and objective knowledge improve with larger models, while subjective preferences do not. The proposed World Preference Modeling technique enhances performance on various preference benchmarks and integrated RLHF pip...

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages