🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
CLIPSym: Delving into Symmetry Detection with CLIP |
Published at 2025-08-19 |
#ML
|
The authors present CLIPSym, a method that uses the CLIP model and a special decoder to improve symmetry detection in images. They also introduce a new technique called SAPG to better integrate semantic cues, and their approach outperforms current methods on standard symmetry detection datasets.... |
Read More |
|
|
![]() |
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis |
Published at 2025-08-19 |
#ML
|
The authors present TalkVid, a large and diverse dataset of 1244 hours of video from 7729 speakers, designed to improve the generalization of audio-driven talking head synthesis models across ethnicity, language, and age groups. They also introduce TalkVid-Bench, a balanced evaluation set, and show that models trained on TalkVid perform better and more consistently across different subgroups compared to previous datasets.... |
Read More |
|
|
|
![]() |
EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks |
Published at 2025-08-23 |
#ML
|
The authors present EduRABSA, the first public, annotated dataset for Aspect-based Sentiment Analysis in education reviews, covering courses, teaching staff, and universities. They also introduce ASQE-DPT, a tool for creating labeled datasets for comprehensive ABSA tasks, aiming to advance research and support transparency in this under-resourced area.... |
Read More |
|
|
![]() |
Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery |
Published at 2025-08-24 |
#ML
|
The study presents a new model, VIPER-R1, that discovers physical formulas by integrating visual perception, trajectory data, and symbolic reasoning. It outperforms existing models in accuracy and interpretability, using a novel dataset called PhysSymbol.... |
Read More |
|
|
|
![]() |
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code |
Published at 2025-08-25 |
#ML
|
A new benchmark called A.S.E has been created to test the security of AI-generated code at the repository level, using real-world repositories and expert-defined rules. The benchmark found that Claude-3.7-Sonnet performed best overall, the security gap between proprietary and open-source models is small, and simple decoding strategies are better for security patching.... |
Read More |
|
|
![]() |
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training |
Published at 2025-08-25 |
#ML
|
The study presents TiKMiX, a new method that adjusts the data mixture for language model pre-training based on the model's changing preferences, which is more efficient than static mixing strategies. TiKMiX uses Group Influence, an efficient metric to evaluate the impact of data domains on the model, and offers two approaches, TiKMiX-D and TiKMiX-M, to optimize the data mixing problem. The method outperforms state-of-the-art techniques while using fewer computational resources.... |
Read More |
|
|
|
![]() |
HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation |
Published at 2025-08-27 |
#ML
|
The study presents HERMES, a framework for teaching robots complex manipulation skills using human motion data from various sources. HERMES transforms human hand motions into robot actions, adapts to different environments, and improves real-world performance by integrating a navigation system with precise localization, enabling robots to perform diverse and intricate tasks.... |
Read More |
|
|
![]() |
Quantization Robustness to Input Degradations for Object Detection |
Published at 2025-08-27 |
#ML
|
This study examines how reducing precision affects the performance of YOLO object detection models in recognizing objects under various real-world image distortions. The researchers propose a new method to improve model robustness under distortions, which works better for larger models and specific distortions but doesn't consistently improve performance across all models and distortions.... |
Read More |
|
|
|
![]() |
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers |
Published at 2025-08-28 |
#ML
|
This study explores how advanced language models are changing scientific research by focusing on the complex data used to develop them. The authors analyze different types of scientific data, review various language models, and discuss the unique challenges and solutions in this field, ultimately proposing a new approach for AI systems to actively participate in scientific discovery.... |
Read More |
|
|
![]() |
Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks |
Published at 2025-08-28 |
#ML
|
The authors propose a new type of deep, untrained Recurrent Neural Network called Deep Residual Echo State Networks (DeepResESNs), which improves upon traditional Echo State Networks by incorporating residual connections. This results in better long-term information processing and memory capacity, as demonstrated through mathematical analysis and experiments on various time series tasks.... |
Read More |
|
|
|
![]() |
Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation |
Published at 2025-08-28 |
#ML
|
This study presents a new method for generating 3D content using videos, which contain spatial and semantic information that can help overcome the limitations of existing 3D data. The researchers introduced a large-scale video dataset and a generative model that can produce consistent and plausible 3D content, demonstrating the potential for extending this approach to larger scenes.... |
Read More |
|
|
![]() |
Efficient Code Embeddings from Code Generation Models |
Published at 2025-08-28 |
#ML
|
The jina-code-embeddings model suite generates efficient code embeddings using an autoregressive backbone pre-trained on text and code, outperforming existing models in tasks like code retrieval and question-answering, all while being smaller in size.... |
Read More |
|
|
|
![]() |
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control |
Published at 2025-08-28 |
#ML
|
Researchers developed EO-Robotics, which includes the EO-1 model and EO-Data1.5M dataset, to improve multimodal reasoning and robot control. EO-1, a unified embodied foundation model, processes various inputs and is trained using a large, high-quality dataset, enabling seamless robot action generation and multimodal reasoning, as demonstrated through various experiments.... |
Read More |
|
|
![]() |
Model-Task Alignment Drives Distinct RL Outcomes |
Published at 2025-08-28 |
#ML
|
This study finds that the success of recent reinforcement learning methods in large language models depends on the 'Model-Task Alignment'. These new methods work well only when the model and task are already closely related, but fail in more challenging scenarios where traditional RL methods still perform well.... |
Read More |
|
|
|
![]() |
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning |
Published at 2025-08-28 |
#ML
|
The study presents R-4B, a new type of AI model that can decide for itself whether to use complex reasoning to solve a problem or not. It does this by training the model to understand when to use its 'thinking' abilities and when to use simpler methods, resulting in more efficient problem-solving and state-of-the-art performance on various challenging tasks.... |
Read More |
|
|
![]() |
AHELM: A Holistic Evaluation of Audio-Language Models |
Published at 2025-08-29 |
#ML
|
The authors present AHELM, a comprehensive benchmark for evaluating audio-language models (ALMs) across ten important aspects, such as audio perception, knowledge, reasoning, emotion detection, bias, fairness, multilinguality, robustness, toxicity, and safety. AHELM includes new synthetic audio-text datasets and standardizes prompts, inference parameters, and evaluation metrics, allowing for equitable comparisons of 14 open-weight and closed-API ALMs and three baseline systems.... |
Read More |
|
|
|
![]() |
Morae: Proactively Pausing UI Agents for User Choices |
Published at 2025-08-29 |
#ML
|
Morae is a UI agent that helps blind and low-vision users by pausing at decision points during tasks, allowing users to choose options that better match their preferences. It uses large multimodal models to interpret user queries and prompt users for clarification, resulting in more successful task completion compared to baseline agents.... |
Read More |
|
|
![]() |
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models |
Published at 2025-08-29 |
#ML
|
The proposed Think in Games framework uses large language models to improve decision-making in games by combining their reasoning abilities with reinforcement learning, resulting in better performance with less data and more transparency.... |
Read More |
|
|
|
![]() |
UItron: Foundational GUI Agent with Advanced Perception and Planning |
Published at 2025-08-29 |
#ML
|
The study presents UItron, an open-source model for creating automatic GUI agents that can operate on mobile and PC devices. UItron improves GUI agent development through data engineering strategies, interactive infrastructure, and a curriculum reinforcement learning framework, resulting in superior performance in GUI perception, grounding, and planning tasks, especially in Chinese mobile apps.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|