🤗 Daily Paper(2025-08-25)

5 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 25, 2025, 4:06:55 PMAug 25

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Published at 2025-08-11

#ML

The authors present ODYSSEY, a framework that enables agile quadruped robots with manipulators to perform complex tasks in unstructured environments. It integrates high-level task planning with low-level whole-body control, allowing the robots to understand and execute long-term instructions while navigating challenging terrains, thus advancing the development of generalized robotic assistants....

Selective Contrastive Learning for Weakly Supervised Affordance Grounding

Published at 2025-08-11

#ML

The authors propose a method for weakly supervised affordance grounding, which helps recognize object parts that allow certain actions without needing pixel-level annotations. This method uses selective prototypical and pixel contrastive objectives to adaptively learn affordance-relevant cues at both the part and object levels, improving upon existing techniques that often focus on unrelated patterns....

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

Published at 2025-08-14

#ML

The authors present a new method called MDH that uses both large language models and human oversight to accurately detect malicious content in datasets, making it more efficient than previous methods. They also discovered that specific developer messages can improve the success of jailbreaking language models, leading them to propose two new strategies, D-Attack and DH-CoT, to further enhance this process....

EgoTwin: Dreaming Body and View in First Person

Published at 2025-08-18

#ML

The authors propose EgoTwin, a new framework for generating egocentric videos and human motion simultaneously, addressing challenges like viewpoint alignment and causal interplay between video and motion. They use a diffusion transformer architecture, a head-centric motion representation, and a cybernetics-inspired interaction mechanism, evaluating it on a large-scale real-world dataset....

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Published at 2025-08-19

#ML

This study explores how improving the variety of training problems in a method called RLVR can enhance the reasoning ability of large language models. They propose a new strategy called SvS, which generates diverse training problems using the model's correct solutions, leading to significant improvements in performance on various reasoning tasks....

CRISP: Persistent Concept Unlearning via Sparse Autoencoders

Published at 2025-08-19

#ML

The study presents CRISP, a method that uses sparse autoencoders to permanently remove unwanted knowledge from large language models, which is more effective and secure than existing techniques. CRISP outperforms previous approaches in removing harmful knowledge while maintaining the model's overall performance and is capable of precisely suppressing target features....

Distilled-3DGS:Distilled 3D Gaussian Splatting

Published at 2025-08-19

#ML

The researchers present a new method called Distilled-3DGS to improve 3D Gaussian Splatting for creating new images from 3D models. Their approach reduces memory usage and storage requirements by using multiple teacher models to guide a lightweight student model, resulting in better performance compared to existing methods....

Learnable SMPLify: A Neural Solution for Optimization-Free Human Pose Inverse Kinematics

Published at 2025-08-19

#ML

The authors present a new method called Learnable SMPLify that uses artificial neural networks to estimate human pose and shape in 3D, replacing the iterative optimization process found in the previous SMPLify method. This new approach is much faster, up to 200 times faster, and can generalize better to unseen poses and motions, making it a practical and simple baseline for 3D human pose and shape estimation....

Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing

Published at 2025-08-19

#ML

This study presents a new method, Sketch3DVE, for editing structural content in 3D videos with significant viewpoint changes. It uses sketching as a tool for precise geometry control, employs image editing methods for generating edited results, and proposes a point cloud editing approach to align new content with the original 3D scene, resulting in realistic edited videos....

CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Published at 2025-08-20

#ML

The authors present a new method called CARFT that improves the reasoning skills of large language models by using annotated thinking steps and contrastive learning. This approach ensures better use of available data and stabilizes the training process, resulting in improved performance and efficiency compared to existing methods....

End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

Published at 2025-08-21

#ML

The authors present Deep-DxSearch, a system that improves medical diagnosis accuracy by training a retrieval-augmented generation model with reinforcement learning, resulting in better performance than existing methods and providing valuable insights for clinicians....

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference

Published at 2025-08-21

#ML

The authors present Tensor-Parallel Latent Attention (TPLA), a method that divides the latent representation and input dimensions among devices to enhance efficiency in tensor parallelism, while preserving the benefits of a compressed cache. TPLA is compatible with existing models and reduces the per-device cache, resulting in faster performance without sacrificing accuracy....

AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions

Published at 2025-08-22

#ML

The researchers present AetherCode, a new benchmark for evaluating Large Language Models' (LLMs) coding and reasoning abilities. AetherCode uses problems from top programming competitions and includes rigorous, expert-validated test suites to ensure accurate assessment, addressing the limitations of current benchmarks....

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Published at 2025-08-22

#ML

The authors present a new method for creating adaptive Large Language Model (LLM) agents that learn from experiences without requiring expensive fine-tuning of the LLMs. This approach, called AgentFly, uses memory-based online reinforcement learning, allowing for efficient, continuous learning in real-time, and outperforms existing methods in various tasks and scenarios....

AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Published at 2025-08-22

#ML

AgentScope 1.0 is a new version of a framework for building agentic applications that focuses on improving tool-based interactions, abstracts foundational components, and provides unified interfaces and extensible modules. It enhances agent behaviors with the ReAct paradigm, offers advanced infrastructure for better human-agent and agent-agent interactions, and includes built-in agents for specific scenarios, along with robust engineering support for developers....

Do What? Teaching Vision-Language-Action Models to Reject the Impossible

Published at 2025-08-22

#ML

This research introduces a framework called IVA that helps Vision-Language-Action models understand and respond to impossible requests. IVA detects false premises in language instructions, engages in clarification or correction, and suggests plausible alternatives grounded in perception and action. The approach significantly improves false premise detection accuracy and successful responses in false-premise scenarios....

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Published at 2025-08-22

#ML

The study presents InMind, a framework that evaluates if language models can capture and apply individual reasoning styles in social contexts, using social deduction games. The researchers find that most language models rely on word clues and struggle with adapting to changing strategies, while some models show early signs of sensitive reasoning, highlighting the need for improvement in this area....

RotaTouille: Rotation Equivariant Deep Learning for Contours

Published at 2025-08-22

#ML

The study presents RotaTouille, a deep learning framework that can learn from contour data while maintaining rotation and cyclic shift equivariance. This is achieved through complex-valued circular convolution, and the model can be used for tasks like shape classification and contour regression....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages