🤗 Daily Paper(2025-06-05)

10 views

Skip to first unread message

deep.di...@gmail.com

unread,

Jun 5, 2025, 4:07:36 PM6/5/25

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Published at 2025-05-22

#ML

The researchers have created a large-scale dataset and model suite called CASS for translating GPU code between different architectures, specifically from Nvidia to AMD. This tool can translate both the source code and assembly language, and it performs much better than commercial translation tools. The researchers also made a benchmark to test the translations and released all their resources for others to use and improve upon....

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Published at 2025-05-24

#ML

The authors present DiffDecompose, a new method for separating semi-transparent or transparent layers in images using diffusion transformers. They also introduce AlphaBlend, a large-scale dataset for training and testing this method, which includes various real-world scenarios like removing flares or decomposing glassware....

DLP: Dynamic Layerwise Pruning in Large Language Models

Published at 2025-05-27

#ML

The authors present a new method called Dynamic Layerwise Pruning (DLP) that determines the importance of each layer in Large Language Models (LLMs) by integrating model weights with input activation information, thereby improving performance at high sparsity levels. DLP reduces perplexity and improves accuracy compared to existing methods and can be combined with other LLM compression techniques....

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

Published at 2025-05-29

#ML

The authors present a new framework called Segment Policy Optimization (SPO) for improving large language models using reinforcement learning. SPO offers a middle ground between existing token-level and trajectory-level methods by using segment-level advantage estimation, resulting in more accurate credit assignment and better performance on various tasks....

TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

Published at 2025-05-30

#ML

The study presents a new method called TimeHC-RL to improve the social intelligence of Large Language Models (LLMs), which have already shown progress in areas like math and coding. The method focuses on the social world's unique timeline and cognitive requirements, outperforming a widely used method and helping a 7B model compete with more advanced ones....

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation

Published at 2025-05-31

#ML

BenchHub is a new tool that helps researchers and developers test large language models more effectively by collecting and organizing various benchmark datasets from different domains, making it easier to evaluate models for specific needs or use cases. The tool highlights the importance of testing models in specific domains and can improve dataset reuse, model comparisons, and identify areas that need more attention in existing benchmarks....

RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents

Published at 2025-05-31

#ML

The authors present RiOSWorld, a benchmark for evaluating the risks of AI agents using computers in real-world scenarios. RiOSWorld includes various risky tasks and categorizes them into user-originated and environmental risks, demonstrating the significant safety risks current computer-use agents face....

Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents

Published at 2025-06-02

#ML

This study presents a new method for improving the accuracy and reliability of language models when interpreting flowcharts, which are essential for visualizing decision-making processes. The proposed neurosymbolic agent, FlowPathAgent, reduces visual hallucinations in model responses by segmenting, analyzing, and interacting with the flowchart's structure, outperforming existing methods on a new benchmark dataset....

Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation

Published at 2025-06-02

#ML

This study proposes a new data augmentation method using diffusion techniques to improve knowledge distillation, particularly in cases of unknown covariate shift. By generating challenging samples that maximize disagreement between a teacher and student model, the approach significantly enhances accuracy and robustness on various datasets compared to existing methods....

Small Language Models are the Future of Agentic AI

Published at 2025-06-02

#ML

The article argues that small language models (SLMs) are more powerful, suitable, and economical than large language models (LLMs) for many agentic AI tasks. The authors advocate for a shift towards SLMs, discussing potential barriers and proposing a conversion algorithm from LLMs to SLMs, to reduce AI costs and promote efficient use of resources....

Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Published at 2025-06-02

#ML

The Psi-Sampler framework uses an SMC-based method with a new algorithm called pCNL to improve sampling efficiency in reward alignment tasks for score-based generative models. By initializing particles from the reward-aware posterior and using pCNL for high-dimensional posterior sampling, this method outperforms existing methods in various tasks like layout-to-image generation and aesthetic-preference generation....

A Controllable Examination for Long-Context Language Models

Published at 2025-06-03

#ML

The authors present LongBioBench, a new benchmark using artificial biographies to test long-context language models. Experiments show that most models still struggle with understanding and reasoning over long contexts, and LongBioBench is more interpretable and controllable than existing synthetic benchmarks....

Beyond the Surface: Measuring Self-Preference in LLM Judgments

Published at 2025-06-03

#ML

This research presents a new method, DBG score, to measure self-preference bias in large language models (LLMs) by using gold judgments as proxies for response quality. The study examines the impact of response text style and post-training data on self-preference bias and explores possible explanations from an attention-based perspective....

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Published at 2025-06-03

#ML

The study addresses limitations of reinforcement learning with numerical feedback in large language models by proposing Critique-GRPO, an online framework that uses both natural language and numerical feedback for policy optimization. Experiments show that Critique-GRPO significantly outperforms other methods in various reasoning tasks, improving average scores by around 5%....

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Published at 2025-06-03

#ML

The study presents DenseDPO, a method that improves upon the Direct Preference Optimization technique used for text-to-video diffusion models. DenseDPO addresses limitations in fine-grained comparisons and motion bias by creating aligned video pairs and labeling preferences on short segments, resulting in better motion generation with fewer labels. Additionally, DenseDPO can use Vision Language Models for automatic preference annotation, achieving performance close to human labels....

FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Published at 2025-06-03

#ML

The researchers present FinChain, a new tool for testing complex, step-by-step financial reasoning in AI models. They created a large dataset with various financial topics and reasoning difficulties, along with a new way to evaluate both the final answers and the intermediate steps. When they tested 30 AI models on this dataset, they found that even the best models need improvement in multi-step financial reasoning....

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

Published at 2025-06-03

#ML

The research presents IllumiCraft, a new video generation framework that uses three types of inputs: lighting maps, synthetically lit images, and 3D geometry data, to create videos with controlled lighting and appearance. This method produces more detailed and coherent videos compared to existing techniques, allowing for better background and text-based video relighting....

Quantitative LLM Judges

Published at 2025-06-03

#ML

The study presents a new method to enhance the evaluation of large language models (LLMs) by aligning their scores with human scores using regression models. This approach is more efficient and adaptable than traditional methods, as demonstrated through experiments on various datasets. (ELI5: Think of it as teaching a computer to better understand and rate human language by learning from human feedback, making the computer's judgment more accurate and efficient.)...

RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions

Published at 2025-06-03

#ML

The authors present RefEdit, a new model for improving image editing in complex scenes with multiple objects, which outperforms existing models trained on millions of data samples. They also introduce RefEdit-Bench, a real-world benchmark for evaluating this task, and release their data and model for others to replicate their results....

Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting

Published at 2025-06-03

#ML

The authors present a new framework called Asymmetric Dual 3DGS that creates more stable and consistent 3D reconstructions from challenging in-the-wild images by leveraging the randomness of visual artifacts. This method uses two parallel 3D Gaussian Splatting models with a consistency constraint and a divergent masking strategy to reduce shared error modes, and a lightweight variant called Dynamic EMA Proxy to improve training efficiency....

Robustness in Both Domains: CLIP Needs a Robust Text Encoder

Published at 2025-06-03

#ML

This research focuses on enhancing the robustness of text encoders in CLIP, an AI model, by introducing LEAF, a method for creating robust text encoders that can be scaled for large CLIP models. The improved text encoders help maintain performance in various applications like text-to-image generation and multimodal retrieval tasks, even under adversarial attacks....

SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation

Published at 2025-06-03

#ML

The study presents SVGenius, a comprehensive benchmark for evaluating large language models and multimodal LLMs in processing SVG files, which covers 2,377 queries across three dimensions: understanding, editing, and generation. The benchmark assesses 22 models across various categories and reveals that while proprietary models perform better than open-source ones, all models struggle with complexity, suggesting that reasoning-enhanced training is more effective than scaling for improving perfor...

Solving Inverse Problems with FLAIR

Published at 2025-06-03

#ML

This study presents FLAIR, a new method that uses flow-based generative models to improve the quality of solutions for inverse imaging problems. By introducing a variational objective for flow matching and combining it with deterministic trajectory adjustments, FLAIR can recover rare, atypical data modes and consistently outperforms existing diffusion- and flow-based methods in terms of reconstruction quality and sample diversity....

TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models

Published at 2025-06-03

#ML

The authors describe TalkingMachines, a system that creates real-time, audio-driven character animations using pretrained video generation models and an audio large language model, enabling natural conversations. They achieve this by adapting a pretrained model, enabling infinite video streaming, and designing a high-throughput inference pipeline with various engineering optimizations....

Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Published at 2025-06-03

#ML

This study presents a new method called Critique Fine-Tuning (CFT) that efficiently unlocks the reasoning potential of large language models (LLMs) by training them on only one problem. CFT uses diverse model-generated solutions and detailed critiques from teacher LLMs to improve performance on various reasoning tasks, requiring significantly less computational power than existing methods....

Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning

Published at 2025-06-03

#ML

The authors present a new framework called Video-Skill-CoT that tailors video reasoning to specific domains by creating skill-based reasoning annotations and training specialized expert modules. They show that this approach outperforms existing methods on various video understanding tasks, and provide detailed analysis on the effectiveness of their method....

Adapt before Continual Learning

Published at 2025-06-04

#ML

This study presents a new method called ACL that improves the ability of pre-trained models to learn new tasks while retaining old knowledge, by refining the model's backbone through a special adaptation phase. The method enhances the model's learning capacity and prevents forgetting, as demonstrated through various experiments and benchmarks....

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Published at 2025-06-04

#ML

This study improves multimodal reasoning in large language models by optimizing cold start initialization, addressing gradient stagnation in reinforcement learning, and introducing a staged training approach that balances perceptual and cognitive development, resulting in a new state-of-the-art model called ReVisual-R1....

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

Published at 2025-06-04

#ML

The researchers present AmbiK, a dataset of ambiguous instructions for a robot in a kitchen environment, which aims to provide a universal benchmark for comparing various methods of task ambiguity detection. AmbiK, which was created using Large Language Models and validated by humans, contains 1000 pairs of ambiguous and unambiguous tasks, along with their corresponding details, for a total of 2000 tasks....

CRAWLDoc: A Dataset for Robust Ranking of Bibliographic Documents

Published at 2025-06-04

#ML

The researchers have developed a new method called CRAWLDoc to effectively rank relevant web documents by starting from a publication's URL and retrieving all linked resources. They tested CRAWLDoc on a new dataset of 600 publications and found that it can robustly rank documents across different publishers and formats, improving metadata extraction from varied web documents....

Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis

Published at 2025-06-04

#ML

This study identifies that data contamination in language model evaluations is often due to shortcut solutions in model training. The researchers propose a new method to detect and suppress these shortcut neurons, resulting in more accurate and trustworthy evaluations, with a strong correlation to a recent benchmark....

HTSC-2025: A Benchmark Dataset of Ambient-Pressure High-Temperature Superconductors for AI-Driven Critical Temperature Prediction

Published at 2025-06-04

#ML

A new dataset called HTSC-2025 has been created to help improve AI predictions for high-temperature superconducting materials. This dataset includes various superconducting materials discovered from 2023 to 2025 and is available for use and updates online....

Image Editing As Programs with Diffusion Models

Published at 2025-06-04

#ML

The authors present a new method called Image Editing As Programs (IEAP) that improves instruction-driven image editing with diffusion models. IEAP breaks down complex editing instructions into simpler operations, which are then executed in sequence to achieve the desired edit, resulting in better accuracy and semantic fidelity, especially for complex edits....

LayerFlow: A Unified Model for Layer-aware Video Generation

Published at 2025-06-04

#ML

LayerFlow is a system that creates videos with separate foreground, background, and blended scenes using text prompts for each layer. It uses a trained model to generate smooth videos with desired layers, starting from a text-to-video diffusion transformer and a multi-stage training strategy....

MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos

Published at 2025-06-04

#ML

The study presents MMR-V, a video benchmark that tests multimodal deep reasoning in videos, focusing on long-range, multi-frame reasoning, hidden information, and reliability. Experiments show that current models struggle with this task, even with advanced reasoning strategies, highlighting the need for further research in this area....

MiMo-VL Technical Report

Published at 2025-06-04

#ML

The researchers have developed two advanced vision-language models, MiMo-VL-7B-SFT and MiMo-VL-7B-RL, which outperform existing models in various tasks, including multimodal reasoning and GUI grounding. The models were trained using a combination of pre-training and mixed reinforcement learning, and the researchers have also created a comprehensive evaluation suite to assess their performance....

OpenThoughts: Data Recipes for Reasoning Models

Published at 2025-06-04

#ML

The OpenThoughts project created an open-source dataset called OpenThoughts2-1M for training reasoning models, leading to the development of OpenThinker2-32B, the first model trained on public reasoning data to match a state-of-the-art model on standard reasoning benchmarks. Further improvements to the dataset resulted in the OpenThinker3-7B model, which achieved state-of-the-art results on various reasoning benchmarks....

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

Published at 2025-06-04

#ML

The authors present Orak, a new benchmark for training and evaluating language model agents in various video games. Orak includes 12 popular games across major genres, allowing for a comprehensive study of language model capabilities and essential modules for complex gameplay, and it provides a plug-and-play interface and a fine-tuning dataset for consistent evaluation....

POSS: Position Specialist Generates Better Draft for Speculative Decoding

Published at 2025-06-04

#ML

This research presents Position Specialists (PosS) to enhance speculative decoding in Large Language Models (LLMs). PosS improves token accuracy at later positions by using position-specialized draft layers, reducing error accumulation and boosting acceptance rates. Experiments show that PosS outperforms baselines in average acceptance length and speed-up ratio on various datasets....

Rectified Sparse Attention

Published at 2025-06-04

#ML

The authors present a new method called Rectified Sparse Attention (ReSA) that enhances the efficiency of long-sequence generation in Large Language Models. ReSA minimizes approximation errors and preserves generation quality by combining block-sparse attention with periodic dense rectification, resulting in faster processing times and maintaining near-perfect generation quality....

Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective

Published at 2025-06-04

#ML

This study explores the balance between stability and plasticity in neural networks from an architectural perspective. The researchers find that deeper networks are more adaptable, while wider networks are more stable, and propose a new framework called Dual-Arch to improve existing continual learning methods by combining the strengths of two specialized networks....

Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning

Published at 2025-06-04

#ML

The study presents Rex-Thinker, a model that uses a chain-of-thought reasoning approach for object referring, which enhances explainability and accuracy. Rex-Thinker identifies potential object candidates and assesses them step-by-step to match a given expression, supported by a large-scale CoT-style referring dataset. The model outperforms standard baselines in precision and interpretability, and demonstrates improved ability to reject hallucinated outputs and strong generalization in out-of-do...

Sounding that Object: Interactive Object-Aware Image to Audio Generation

Published at 2025-06-04

#ML

The authors present a model that generates sounds for specific objects in an image, allowing users to interactively choose which object's sound to produce. This model uses a conditional latent diffusion model and multi-modal attention to connect image regions with their corresponding sounds, outperforming existing methods in aligning objects with their sounds....

SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Published at 2025-06-04

#ML

The authors present SuperWriter-Agent, a new framework that improves long-form text generation by incorporating structured thinking and planning stages, similar to a professional writer's process. This framework, combined with a hierarchical optimization technique, results in a state-of-the-art language model that outperforms larger baseline models in both automatic and human evaluations....

Survey of Active Learning Hyperparameters: Insights from a Large-Scale Experimental Grid

Published at 2025-06-04

#ML

This study explores the impact of hyperparameters on Active Learning (AL) performance by conducting the largest AL experiment to date, with over 4.6 million hyperparameter combinations. The findings aim to provide guidelines for setting up AL, increase trust in its effectiveness, and promote reproducible AL research....

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Published at 2025-06-04

#ML

This review explores how to manage trust, risk, and security in advanced AI systems that use large language models and work together in groups, focusing on unique challenges and solutions for these complex systems....

VLMs Can Aggregate Scattered Training Patches

Published at 2025-06-04

#ML

The study reveals that vision-language models can reassemble harmful image fragments that have been split and scattered across many training samples, posing safety risks. This ability, called visual stitching, allows models to learn dangerous content by associating it with benign descriptions, enabling the generation of harmful responses during use....

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Published at 2025-06-04

#ML

Researchers created VisCode-200K, a large dataset with over 200K examples for training LLMs in generating executable Python visualization code, including 45K multi-turn correction dialogues. They fine-tuned Qwen2.5-Coder-Instruct on this dataset to develop VisCoder, which significantly outperforms open-source baselines and is close to proprietary models in generating accurate, executable visualization code....

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Published at 2025-06-04

#ML

This study presents Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path, without requiring 3D reconstruction pipelines. The method uses three key components: World-Consistent Video Diffusion, Long-Range World Exploration, and Scalable Data Engine, resulting in improved visual quality and geometric accuracy for various applications....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages