🤗 Daily Paper(2025-08-20)

5 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 20, 2025, 4:07:16 PMAug 20

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Published at 2025-08-05

#ML

The authors present ZARA, a new method for recognizing human activities from motion sensor data without the need for retraining or task-specific classifiers. ZARA uses a knowledge base, retrieval module, and agent pipeline to provide accurate, interpretable results and outperforms existing methods in extensive experiments....

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Published at 2025-08-06

#ML

The study presents a new method called Chain-of-Agents (CoA) that allows language models to solve complex problems using multiple agents within a single model, similar to a multi-agent system. The authors develop a training framework to teach the model this ability and create a type of model called an Agent Foundation Model (AFM), which outperforms existing models in various tasks and is fully open-source for future research....

Radiance Fields in XR: A Survey on How Radiance Fields are Envisioned and Addressed for XR Research

Published at 2025-08-06

#ML

This study examines the use of Radiance Fields (RF) in Extended Reality (XR) research by analyzing existing literature. The researchers collected and assessed 365 papers on the subject, focusing on 66 papers that provided detailed insights into RF research for XR. The goal was to understand how RF is applied in XR, identify current implementations, and highlight areas needing further research, ultimately providing a resource for the XR community to navigate RF research developments....

TempFlow-GRPO: When Timing Matters for GRPO in Flow Models

Published at 2025-08-06

#ML

The study presents TempFlow-GRPO, a new framework for text-to-image generation that addresses the issue of inefficient reward-based optimization in existing models. By introducing a trajectory branching mechanism and a noise-aware weighting scheme, TempFlow-GRPO improves the alignment of generated images with human preferences and outperforms existing models in standard text-to-image benchmarks....

MultiRef: Controllable Image Generation with Multiple Visual References

Published at 2025-08-09

#ML

This study presents a new method for creating images using multiple visual references, as opposed to just one, and introduces a large dataset of images made with this approach to help improve future creative tools. The researchers found that even the best current systems struggle with this task, suggesting room for growth in developing more flexible and human-like image generation tools....

Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge

Published at 2025-08-12

#ML

The authors present a new method for evaluating podcast recommendations using Large Language Models, which is more efficient and interpretable than traditional methods. By creating natural-language user profiles based on listening history, the LLM can better understand user preferences and provide more accurate recommendations, as demonstrated in a study with 47 participants....

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Published at 2025-08-12

#ML

The study presents ColorCtrl, a new method for training-free text-guided color editing in images and videos. This approach uses Multi-Modal Diffusion Transformers to accurately manipulate color attributes while preserving the original structure, and it performs better than existing methods in terms of edit quality, consistency, and compatibility with various models....

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Published at 2025-08-13

#ML

The authors propose a method to improve video recommendations by using a large language model to generate natural language descriptions of video content, which are then used alongside traditional recommendation methods. This approach outperforms traditional methods in a large-scale video recommendation dataset....

Advances in Speech Separation: Techniques, Challenges, and Future Trends

Published at 2025-08-14

#ML

This survey offers a thorough look at DNN-based speech separation techniques, covering various learning methods, scenarios, and architectures, and provides unique insights into future trends, quantitative evaluations, and a balanced understanding of the field for both experts and newcomers....

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Published at 2025-08-14

#ML

The authors present a new benchmark called MM-BrowseComp, which tests AI agents' ability to find information using multiple formats like text, images, and videos, rather than just text. They created 224 challenging questions for this benchmark and found that even the best models struggle with it, showing that there's room for improvement in how AI agents handle and reason with multimodal information....

MedSAMix: A Training-Free Model Merging Approach for Medical Image Segmentation

Published at 2025-08-14

#ML

The authors present MedSAMix, a method that combines general and specialized models for medical image segmentation without training. This approach automatically finds the best combination of layers from different models and offers two modes to balance domain-specific accuracy and generalization, improving performance by up to 6.67% on specialized tasks and 4.37% on multi-task evaluations....

Semantic IDs for Joint Generative Search and Recommendation

Published at 2025-08-14

#ML

This study investigates effective methods for creating Semantic IDs that enhance both search and recommendation tasks using a unified model. The researchers compare various approaches and find that a bi-encoder model fine-tuned for both tasks, followed by a unified Semantic ID space, offers the best balance for strong performance in both areas....

Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends

Published at 2025-08-15

#ML

This survey explores ways to protect large language models from misuse and unauthorized use, focusing on methods like model watermarking and fingerprinting. It explains these concepts in simple terms, compares existing techniques, and suggests future research directions to help protect the intellectual property of these valuable models....

Retrieval-augmented reasoning with lean language models

Published at 2025-08-15

#ML

The authors present a new method for creating a lean language model that combines reasoning and retrieval augmented generation in a single architecture, aimed at resource-constrained or secure environments. This model, which can interpret complex queries using a lightweight backbone, significantly improves answer accuracy and consistency by fine-tuning it with domain-specific data, making it feasible for local deployment....

Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Published at 2025-08-16

#ML

The authors present FineCE, a new method for accurate and detailed confidence scoring during text generation by large language models. They create a pipeline for generating training data that reflects the model's probabilistic distribution, then train a model to predict confidence scores for any text sequence. Additionally, they propose a strategy to improve confidence estimation using information from subsequent text and introduce strategies for optimal confidence estimation positions. FineCE o...

CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection

Published at 2025-08-17

#ML

The authors present CorrSteer, a method that selects features from large language models using correlation between sample correctness and activation from generated tokens, which results in improved task performance and automation of the steering process. This approach has shown significant improvements in various benchmarks and reveals semantically meaningful patterns aligned with each task's requirements....

A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models

Published at 2025-08-18

#ML

The authors propose a new method called ProActive Self-Refinement (PASR) that allows language models to improve their outputs during the generation process, unlike existing methods that regenerate entire responses. PASR makes decisions on refinement based on the model's internal state and evolving context, and it significantly enhances problem-solving performance while reducing token consumption and improving accuracy in various tasks....

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Published at 2025-08-18

#ML

This study presents a new approach called Atomic Thought that breaks down reasoning into smaller parts for large language models, enabling them to better tackle complex tasks by using fine-grained rewards. The proposed Atom-Searcher framework, built on Atomic Thought, significantly improves the models' performance in agentic deep research, making their reasoning more interpretable and human-like....

CAMAR: Continuous Actions Multi-Agent Routing

Published at 2025-08-18

#ML

The authors present CAMAR, a new multi-agent reinforcement learning benchmark for pathfinding in environments with continuous actions, supporting both cooperative and competitive interactions. They also propose a three-tier evaluation protocol and integrate classical planning methods into MARL pipelines, providing a suite of test scenarios and tools for reproducibility and fair comparison....

Leveraging Large Language Models for Predictive Analysis of Human Misery

Published at 2025-08-18

#ML

This research explores using advanced language models to predict the level of misery in real-world situations from their descriptions. They tested different methods and introduced a new game-like evaluation system to better understand the models' reasoning abilities and adaptability....

Motion2Motion: Cross-topology Motion Transfer with Sparse Correspondence

Published at 2025-08-18

#ML

The paper presents a new method called Motion2Motion that transfers animations between characters with different skeletal structures, using only one or a few example motions and a small set of bone correspondences. Motion2Motion is effective in both similar and dissimilar skeleton transfer scenarios, and its practical utility is demonstrated through successful integration in downstream applications and user interfaces....

Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts

Published at 2025-08-18

#ML

This study presents a self-attentive prototypical network to enhance the detection of synthesized speech in various conditions, such as unseen synthesis methods or languages, by rapidly adapting to new data with few samples. The proposed method significantly outperforms traditional zero-shot detectors in challenging conditions, reducing error rates by up to 32% and 20% in Japanese deepfakes and ASVspoof 2021 Deepfake datasets, respectively....

Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding

Published at 2025-08-19

#ML

This study compares the moral understanding of large language models to humans by considering human disagreement and model sensitivity. The results show that AI models often rank in the top 25% of human annotators and have more sensitive moral detection capabilities, making fewer mistakes than humans....

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Published at 2025-08-19

#ML

This study proposes a new method called 'pointing' to improve generalization in embodied AI by bridging the gap between high-level vision-language comprehension and low-level actions. The proposed model, Embodied-R1, outperforms existing models in various benchmarks and demonstrates robust zero-shot generalization in real-world tasks without any task-specific fine-tuning....

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos

Published at 2025-08-19

#ML

LongSplat is a new method that improves novel view synthesis from long videos with irregular motion and unknown camera positions. It does this by optimizing camera poses and 3D Gaussians together, using learned 3D priors for pose estimation, and efficiently organizing point clouds. This results in better rendering quality, more accurate poses, and faster computation compared to existing methods....

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence

Published at 2025-08-19

#ML

The authors present MMAU-Pro, a new benchmark for evaluating AI systems' audio intelligence. This benchmark tests 49 unique skills across various complex dimensions using real-world audio data, revealing that even advanced AI models struggle with tasks like long-form audio comprehension and spatial audio reasoning....

OmniTry: Virtual Try-On Anything without Masks

Published at 2025-08-19

#ML

The study introduces OmniTry, a framework that enables virtual try-on for any wearable objects, not just clothes, without the need for masks. To address data challenges, they propose a two-stage pipeline using large-scale unpaired images and fine-tuning with paired images, resulting in better object localization and appearance preservation compared to existing methods....

Prompt Orchestration Markup Language

Published at 2025-08-19

#ML

The study presents POML, a new language designed to improve the prompting of Large Language Models. POML uses a component-based markup for structuring prompts, specialized tags for integrating various data types, and a CSS-like styling system to manage presentation. It also provides templating for dynamic prompts and a developer toolkit for better collaboration and version control. The effectiveness of POML is demonstrated through case studies and a user study....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages