🤗 Daily Paper(2025-08-22)

9 views
Skip to first unread message

deep.di...@gmail.com

unread,
Aug 22, 2025, 4:07:00 PMAug 22
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

INTIMA: A Benchmark for Human-AI Companionship Behavior

Published at 2025-08-04

#ML

The study presents INTIMA, a benchmark for assessing companionship behaviors in language models. By evaluating responses from various AI models, the research reveals differences in how they handle emotionally charged interactions, emphasizing the importance of balanced boundary-setting and emotional support for user well-being....

Read Moreicon

Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

Published at 2025-08-20

#ML

The study presents Fin-PRM, a specialized Process Reward Model designed for financial reasoning tasks, which outperforms general-purpose PRMs in evaluating and guiding reasoning steps in finance. Experiments show significant improvements in supervised learning, reinforcement learning, and test-time performance compared to baseline models, emphasizing the importance of domain-specific reward modeling for financial reasoning....

Read Moreicon

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Published at 2025-08-20

#ML

The authors present GUI-Owl, a GUI agent model that outperforms other open-source models in various tasks, and Mobile-Agent-v3, a general-purpose framework that further enhances performance. GUI-Owl's innovations include a large-scale virtual environment, diverse agent capabilities, and a scalable reinforcement learning framework, all of which contribute to its superior performance in GUI automation....

Read Moreicon

Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds

Published at 2025-08-20

#ML

This research presents a new method for creating 3D digital humans using just two images, the front and back view, which significantly simplifies the process for users. The approach involves predicting consistent point clouds and supplementing missing color information to obtain complete human point clouds with colors, which are then transformed into 3D Gaussians for high-quality rendering. The method is fast, requiring only 190 ms on a single NVIDIA RTX 4090, and can work with images captured b...

Read Moreicon

aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

Published at 2025-08-20

#ML

The authors have created aiXiv, a new platform for AI and human scientists to collaborate on research proposals and papers. This platform has a multi-agent architecture that allows for seamless integration of AI and human scientists, making it easy to submit, review, and improve research content. Through experiments, the authors show that aiXiv enhances the quality of AI-generated research and helps to advance scientific progress by providing a reliable and scalable ecosystem for autonomous scie...

Read Moreicon

"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

Published at 2025-08-21

#ML

This research proposes a new kind of AI agent, called Geo-Visual Agents, that can understand and answer complex visual questions about the world by analyzing large amounts of geographic images and traditional GIS data. These agents could help people make more informed decisions by providing detailed visual information, like whether a cafe entrance looks accessible, beyond what's available in current digital maps....

Read Moreicon

A Survey on Large Language Model Benchmarks

Published at 2025-08-21

#ML

This study organizes and analyzes 283 large language model benchmarks into three categories: general capabilities, domain-specific, and target-specific. The research highlights issues with current benchmarks, such as inflated scores, bias, and lack of evaluation in certain areas, and suggests a reference framework for future benchmark development....

Read Moreicon

ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling

Published at 2025-08-21

#ML

The study presents ATLAS, a high-quality 3D human body model that can accurately represent diverse body shapes, poses, and expressions. It does this by separating the body's shape and skeleton, allowing for more detailed customization and better fitting of unseen subjects compared to existing methods....

Read Moreicon

Deep Think with Confidence

Published at 2025-08-21

#ML

The study presents a new method called Deep Think with Confidence that improves reasoning efficiency and performance in large language models without additional training or tuning. This method uses a model's internal confidence signals to filter out low-quality reasoning traces, reducing generated tokens and increasing accuracy on various reasoning tasks....

Read Moreicon

Intern-S1: A Scientific Multimodal Foundation Model

Published at 2025-08-21

#ML

This study presents Intern-S1, a highly specialized AI model designed to analyze scientific data across multiple formats. By leveraging advanced algorithms, extensive data, and innovative training methods, Intern-S1 outperforms open-source models in scientific tasks and even surpasses closed-source models in professional areas like molecular synthesis and crystal stability prediction....

Read Moreicon

LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model

Published at 2025-08-21

#ML

The paper presents LLaSO, an open and transparent framework for large-scale speech-language modeling, addressing the lack of reproducibility in the field. LLaSO provides three essential resources: a speech-text alignment corpus, a multi-task instruction-tuning dataset, and a reproducible benchmark for evaluation, enabling systematic comparisons and community-driven progress in LSLMs....

Read Moreicon

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Published at 2025-08-21

#ML

The authors present LiveMCP-101, a benchmark of 101 real-world queries that test AI agents' ability to use multiple MCP tools in coordination, and introduce a novel evaluation approach that reflects real-world environments. Experiments show that even advanced AI models struggle with these tasks, revealing areas for improvement in tool orchestration and efficiency....

Read Moreicon

SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

Published at 2025-08-21

#ML

The authors propose SceneGen, a new framework that creates multiple 3D assets with geometry and texture from a scene image and object masks in just one step, without needing optimization or asset retrieval. SceneGen can also handle multi-image inputs and has been shown to be efficient and robust in generating high-quality 3D content, which could improve its use in various applications....

Read Moreicon

Visual Autoregressive Modeling for Instruction-Guided Image Editing

Published at 2025-08-21

#ML

This study presents VAREdit, a new visual autoregressive framework for instruction-guided image editing that improves adherence to editing instructions and efficiency compared to diffusion-based methods. VAREdit reframes image editing as a next-scale prediction problem and introduces a Scale-Aligned Reference module to effectively condition the source image tokens, resulting in a 30%+ higher GPT-Balance score and faster editing speed than leading diffusion-based methods....

Read Moreicon

Waver: Wave Your Way to Lifelike Video Generation

Published at 2025-08-21

#ML

Waver is a model that creates realistic videos and images using text or image prompts. It uses a special architecture to improve performance and has a strict data quality process to ensure high-quality outputs. Waver is one of the best at generating videos and can capture complex motions better than many other models....

Read Moreicon

When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding

Published at 2025-08-21

#ML

The authors present Grounded VideoDiT, a Video LLM that improves temporal perception in video understanding by introducing a Diffusion Temporal Latent encoder, object grounded representations, and a mixed token scheme with discrete temporal tokens, resulting in state of the art performance on various video benchmarks....

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages