🤗 Daily Paper(2025-09-01)

3 views

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 1, 2025, 4:06:54 PMSep 1

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

CLIPSym: Delving into Symmetry Detection with CLIP

Published at 2025-08-19

#ML

The authors present CLIPSym, a method that uses the CLIP model and a special decoder to improve symmetry detection in images. They also introduce a new technique called SAPG to better integrate semantic cues, and their approach outperforms current methods on standard symmetry detection datasets....

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

Published at 2025-08-19

#ML

The authors present TalkVid, a large and diverse dataset of 1244 hours of video from 7729 speakers, designed to improve the generalization of audio-driven talking head synthesis models across ethnicity, language, and age groups. They also introduce TalkVid-Bench, a balanced evaluation set, and show that models trained on TalkVid perform better and more consistently across different subgroups compared to previous datasets....

EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks

Published at 2025-08-23

#ML

The authors present EduRABSA, the first public, annotated dataset for Aspect-based Sentiment Analysis in education reviews, covering courses, teaching staff, and universities. They also introduce ASQE-DPT, a tool for creating labeled datasets for comprehensive ABSA tasks, aiming to advance research and support transparency in this under-resourced area....

Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

Published at 2025-08-24

#ML

The study presents a new model, VIPER-R1, that discovers physical formulas by integrating visual perception, trajectory data, and symbolic reasoning. It outperforms existing models in accuracy and interpretability, using a novel dataset called PhysSymbol....

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Published at 2025-08-25

#ML

A new benchmark called A.S.E has been created to test the security of AI-generated code at the repository level, using real-world repositories and expert-defined rules. The benchmark found that Claude-3.7-Sonnet performed best overall, the security gap between proprietary and open-source models is small, and simple decoding strategies are better for security patching....

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

Published at 2025-08-25

#ML

The study presents TiKMiX, a new method that adjusts the data mixture for language model pre-training based on the model's changing preferences, which is more efficient than static mixing strategies. TiKMiX uses Group Influence, an efficient metric to evaluate the impact of data domains on the model, and offers two approaches, TiKMiX-D and TiKMiX-M, to optimize the data mixing problem. The method outperforms state-of-the-art techniques while using fewer computational resources....

HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation

Published at 2025-08-27

#ML

The study presents HERMES, a framework for teaching robots complex manipulation skills using human motion data from various sources. HERMES transforms human hand motions into robot actions, adapts to different environments, and improves real-world performance by integrating a navigation system with precise localization, enabling robots to perform diverse and intricate tasks....

Quantization Robustness to Input Degradations for Object Detection

Published at 2025-08-27

#ML

This study examines how reducing precision affects the performance of YOLO object detection models in recognizing objects under various real-world image distortions. The researchers propose a new method to improve model robustness under distortions, which works better for larger models and specific distortions but doesn't consistently improve performance across all models and distortions....

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Published at 2025-08-28

#ML

This study explores how advanced language models are changing scientific research by focusing on the complex data used to develop them. The authors analyze different types of scientific data, review various language models, and discuss the unique challenges and solutions in this field, ultimately proposing a new approach for AI systems to actively participate in scientific discovery....

Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks

Published at 2025-08-28

#ML

The authors propose a new type of deep, untrained Recurrent Neural Network called Deep Residual Echo State Networks (DeepResESNs), which improves upon traditional Echo State Networks by incorporating residual connections. This results in better long-term information processing and memory capacity, as demonstrated through mathematical analysis and experiments on various time series tasks....

Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

Published at 2025-08-28

#ML

This study presents a new method for generating 3D content using videos, which contain spatial and semantic information that can help overcome the limitations of existing 3D data. The researchers introduced a large-scale video dataset and a generative model that can produce consistent and plausible 3D content, demonstrating the potential for extending this approach to larger scenes....

Efficient Code Embeddings from Code Generation Models

Published at 2025-08-28

#ML

The jina-code-embeddings model suite generates efficient code embeddings using an autoregressive backbone pre-trained on text and code, outperforming existing models in tasks like code retrieval and question-answering, all while being smaller in size....

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Published at 2025-08-28

#ML

Researchers developed EO-Robotics, which includes the EO-1 model and EO-Data1.5M dataset, to improve multimodal reasoning and robot control. EO-1, a unified embodied foundation model, processes various inputs and is trained using a large, high-quality dataset, enabling seamless robot action generation and multimodal reasoning, as demonstrated through various experiments....

Model-Task Alignment Drives Distinct RL Outcomes

Published at 2025-08-28

#ML

This study finds that the success of recent reinforcement learning methods in large language models depends on the 'Model-Task Alignment'. These new methods work well only when the model and task are already closely related, but fail in more challenging scenarios where traditional RL methods still perform well....

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Published at 2025-08-28

#ML

The study presents R-4B, a new type of AI model that can decide for itself whether to use complex reasoning to solve a problem or not. It does this by training the model to understand when to use its 'thinking' abilities and when to use simpler methods, resulting in more efficient problem-solving and state-of-the-art performance on various challenging tasks....

AHELM: A Holistic Evaluation of Audio-Language Models

Published at 2025-08-29

#ML

The authors present AHELM, a comprehensive benchmark for evaluating audio-language models (ALMs) across ten important aspects, such as audio perception, knowledge, reasoning, emotion detection, bias, fairness, multilinguality, robustness, toxicity, and safety. AHELM includes new synthetic audio-text datasets and standardizes prompts, inference parameters, and evaluation metrics, allowing for equitable comparisons of 14 open-weight and closed-API ALMs and three baseline systems....

Morae: Proactively Pausing UI Agents for User Choices

Published at 2025-08-29

#ML

Morae is a UI agent that helps blind and low-vision users by pausing at decision points during tasks, allowing users to choose options that better match their preferences. It uses large multimodal models to interpret user queries and prompt users for clarification, resulting in more successful task completion compared to baseline agents....

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Published at 2025-08-29

#ML

The proposed Think in Games framework uses large language models to improve decision-making in games by combining their reasoning abilities with reinforcement learning, resulting in better performance with less data and more transparency....

UItron: Foundational GUI Agent with Advanced Perception and Planning

Published at 2025-08-29

#ML

The study presents UItron, an open-source model for creating automatic GUI agents that can operate on mobile and PC devices. UItron improves GUI agent development through data engineering strategies, interactive infrastructure, and a curriculum reinforcement learning framework, resulting in superior performance in GUI perception, grounding, and planning tasks, especially in Chinese mobile apps....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages