🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
Dens3R: A Foundation Model for 3D Geometry Prediction |
Published at 2025-07-22 |
#ML
|
The authors propose Dens3R, a 3D foundation model for joint geometric dense prediction, which overcomes the limitation of existing methods that can only predict a single geometry quantity. Dens3R uses a two-stage training framework and a lightweight shared encoder-decoder backbone to accurately regress multiple geometric quantities, ensuring consistent geometry perception and supporting geometrically consistent multi-view inference.... |
Read More |
|
|
![]() |
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation |
Published at 2025-07-23 |
#ML
|
This study presents InstructVLA, a model that combines strong text reasoning with excellent manipulation performance in robots by using a new training method. InstructVLA improves upon existing models in both controlled environments and real-world settings, showcasing its potential for enhancing human-robot interaction.... |
Read More |
|
|
|
![]() |
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks |
Published at 2025-07-26 |
#ML
|
This study explores a new problem: optimizing compute resources for multi-stage complex tasks using large language models, which involves selecting suitable models and allocating budgets per subtask. The proposed AgentTTS framework, inspired by empirical insights, efficiently finds compute-optimal allocations through iterative feedback-driven interactions with the environment, outperforming traditional and other LLM-based baselines in search efficiency and robustness.... |
Read More |
|
|
![]() |
Embedding-Aware Quantum-Classical SVMs for Scalable Quantum Machine Learning |
Published at 2025-07-28 |
#ML
|
The authors present a new method for Quantum Support Vector Machines that uses a quantum-classical pipeline with class-balanced k-means distillation and pretrained Vision Transformer embeddings. They find that using Vision Transformer embeddings improves accuracy over classical SVMs and reveals a connection between transformer attention and quantum feature spaces, offering a scalable solution for quantum machine learning.... |
Read More |
|
|
|
![]() |
Cyber-Zero: Training Cybersecurity Agents without Runtime |
Published at 2025-07-29 |
#ML
|
The authors present Cyber-Zero, a new framework for training cybersecurity language models without needing actual runtime environments. They use publicly available data and simulations to create realistic scenarios, allowing them to train models that outperform existing ones on various cybersecurity benchmarks, making high-quality cybersecurity agents more accessible.... |
Read More |
|
|
![]() |
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report |
Published at 2025-08-01 |
#ML
|
The authors have developed Foundation-Sec-8B-Instruct, a cybersecurity-focused language model designed for chat-style interactions and following instructions, which outperforms Llama 3.1-8B-Instruct in cybersecurity tasks and is competitive with GPT-4o-mini in cyber threat intelligence and instruction-following tasks.... |
Read More |
|
|
|
![]() |
Personalized Safety Alignment for Text-to-Image Diffusion Models |
Published at 2025-08-01 |
#ML
|
This study presents a framework called Personalized Safety Alignment (PSA) that enables users to customize safety settings in text-to-image diffusion models based on individual preferences like age, mental health, and personal beliefs. PSA uses a new dataset called Sage and a cross-attention mechanism to adjust the model's behavior, resulting in better suppression of harmful content and higher alignment with user constraints compared to existing methods.... |
Read More |
|
|
![]() |
Platonic Representations for Poverty Mapping: Unified Vision-Language Codes or Agent-Induced Novelty? |
Published at 2025-08-01 |
#ML
|
The study explores the connection between satellite imagery, web text, and socio-economic indicators like household wealth. It introduces a framework that combines vision and language models to predict wealth, finding that this approach outperforms vision-only methods and suggests a shared representation of material well-being, while also releasing a large-scale multimodal dataset for further research.... |
Read More |
|
|
|
![]() |
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models |
Published at 2025-08-02 |
#ML
|
The research presents a new method called GlimpsePrune, which dynamically compresses visual data for large vision-language models by pruning irrelevant tokens based on scene complexity, improving efficiency and performance without sacrificing accuracy.... |
Read More |
|
|
![]() |
Artificial Intelligence and Misinformation in Art: Can Vision Language Models Judge the Hand or the Machine Behind the Canvas? |
Published at 2025-08-02 |
#ML
|
This study investigates the limitations of vision language models in accurately attributing artwork to artists and detecting AI-generated images, highlighting the potential for misinformation as people rely more on AI models for art information.... |
Read More |
|
|
|
![]() |
Exploitation Is All You Need... for Exploration |
Published at 2025-08-02 |
#ML
|
This study shows that a meta-reinforcement learning agent can explore new environments without explicit incentives if three conditions are met: the environment has repeatable patterns, the agent has memory, and learning can connect actions to long-term rewards. The agent's exploration behavior arises from maximizing rewards, suggesting that exploration and exploitation may not need to be separate objectives.... |
Read More |
|
|
![]() |
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems |
Published at 2025-08-02 |
#ML
|
RoboMemory is a brain-inspired framework that allows physical robots to learn continuously and improve over time. It has four main components that work together to enable long-term planning and learning, and it has been shown to outperform other robots in various tasks.... |
Read More |
|
|
|
![]() |
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension |
Published at 2025-08-03 |
#ML
|
This study proposes a new method for improving text retrieval by considering the context of short chunks, introducing a novel training paradigm and situated embedding models (SitEmb). The new SitEmb models significantly outperform existing models, including large ones, on a new benchmark for situated retrieval capabilities.... |
Read More |
|
|
![]() |
Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning |
Published at 2025-08-03 |
#ML
|
This study presents a new framework that uses uncertainty to automate the creation of process reward data for improving language models' mathematical reasoning skills. The framework also includes two methods for combining the strengths of majority vote and PRMs to further enhance mathematical reasoning abilities, which have been proven effective through experiments.... |
Read More |
|
|
|
![]() |
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe |
Published at 2025-08-03 |
#ML
|
The authors have created a benchmark called Voxlect to study and model dialects and regional languages across the world using speech foundation models. They tested various speech foundation models' ability to classify dialects in many languages, assessed their performance in noisy conditions, and demonstrated practical applications like improving speech recognition and generating speech.... |
Read More |
|
|
![]() |
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following |
Published at 2025-08-04 |
#ML
|
The study presents a self-supervised reinforcement learning framework that enhances instruction following in reasoning models by utilizing their internal signals, eliminating the need for external supervision and overcoming the trade-off between reasoning and instruction following abilities.... |
Read More |
|
|
|
![]() |
CellForge: Agentic Design of Virtual Cell Models |
Published at 2025-08-04 |
#ML
|
CellForge is a system that uses AI to create optimized virtual cell models by analyzing data and literature, designing methods, and executing experiments automatically. It outperforms existing methods in predicting cell responses to various stimuli, demonstrating the benefits of using AI agents with different perspectives.... |
Read More |
|
|
![]() |
Dynaword: From One-shot to Continuously Developed Datasets |
Published at 2025-08-04 |
#ML
|
The authors present a new method called Dynaword for creating open, large-scale datasets that can be continuously updated by the community, addressing issues of license restrictions, static datasets, and limited quality assurance. They demonstrate this approach with Danish Dynaword, which has four times more tokens than similar datasets, is fully open-licensed, and has received contributions from both industry and research, with light tests to maintain data quality and documentation.... |
Read More |
|
|
|
![]() |
Fitness aligned structural modeling enables scalable virtual screening with AuroBind |
Published at 2025-08-04 |
#ML
|
A new method called AuroBind has been developed to improve virtual screening for drug discovery. It can predict how drugs will bind to proteins with high accuracy and speed, allowing for the screening of large compound libraries quickly. The method has shown promising results in identifying potential drugs for previously undruggable targets.... |
Read More |
|
|
![]() |
Qwen-Image Technical Report |
Published at 2025-08-04 |
#ML
|
The Qwen-Image model is designed to improve complex text rendering and image editing. It uses a large-scale data pipeline and a progressive training strategy for better text rendering, especially in logographic languages like Chinese. For image editing, it employs a dual-encoding mechanism to balance semantic consistency and visual fidelity, achieving state-of-the-art performance in various benchmarks.... |
Read More |
|
|
|
![]() |
ReMoMask: Retrieval-Augmented Masked Motion Generation |
Published at 2025-08-04 |
#ML
|
The study presents a new method called ReMoMask to improve the generation of human motion sequences from text descriptions. It addresses problems like limited diversity and inaccuracies in existing models by using innovative techniques such as a momentum text-motion model, spatio-temporal attention, and guidance for better generalization, resulting in more realistic and accurate motion generation.... |
Read More |
|
|
![]() |
SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System |
Published at 2025-08-04 |
#ML
|
This study presents SHAMI-MT, a two-way machine translation system that links Modern Standard Arabic (MSA) with the Syrian dialect, using advanced models and a large dataset. The system was tested and found to produce high-quality, accurate, and culturally authentic translations, filling a gap in dialectal Arabic translation and benefiting content localization, cultural heritage, and intercultural communication.... |
Read More |
|
|
|
![]() |
Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction |
Published at 2025-08-04 |
#ML
|
The study presents a new method called Sparse-dLLM that improves the efficiency of diffusion Large Language Models by selectively removing unimportant data from memory during inference, without requiring any additional training. This approach leads to significant improvements in processing speed and memory usage, making it possible to handle longer contexts and perform better than existing methods.... |
Read More |
|
|
![]() |
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo |
Published at 2025-08-04 |
#ML
|
The authors present a new framework called VeOmni that makes it easier and more efficient to train models that understand and generate information across different formats, such as text, images, and audio. VeOmni achieves this by separating the communication and computation processes and providing a flexible interface for integrating new types of data, resulting in faster training times and the ability to handle larger models and longer contexts.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|