🤗 Daily Paper(2025-08-27)

6 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 27, 2025, 4:07:06 PMAug 27

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Published at 2025-08-13

#ML

The study presents ReportBench, a benchmark for evaluating the quality of research reports generated by large language models, focusing on the relevance of cited literature and the accuracy of statements. The authors use high-quality published survey papers as references and develop an automated framework to analyze generated reports, finding that commercial Deep Research agents outperform standalone LLMs but still have room for improvement....

Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Published at 2025-08-20

#ML

The authors present a new method called PIXIE that can quickly predict physical properties of 3D scenes from visual information using a neural network trained with supervised learning. PIXIE is faster and more accurate than existing methods and can even generalize to real-world scenes using pretrained visual features like CLIP....

Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

Published at 2025-08-20

#ML

The researchers propose a new framework called Selct2Know (S2K) to improve domain-specific question answering by internalizing domain knowledge through a cost-effective internal-external knowledge self-selection strategy. S2K outperforms existing methods and matches domain-pretrained language models with lower costs in medical, legal, and financial QA benchmarks....

CineScale: Free Lunch in High-Resolution Cinematic Visual Generation

Published at 2025-08-21

#ML

The study presents CineScale, a new method that allows for generating high-resolution images and videos without needing to fine-tune pre-trained models, which is a common practice in current methods. CineScale can create 8k images without any adjustments and generates 4k videos with minimal adjustments, improving the quality and detail of visual content compared to existing techniques....

QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting

Published at 2025-08-21

#ML

This study presents a new method called QueryBandits that proactively reduces hallucinations in Large Language Models by rewriting input queries based on linguistic features. Experiments show that QueryBandits significantly outperforms other rewriting strategies and a no-rewrite baseline, demonstrating the effectiveness of this approach in mitigating hallucinations....

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

Published at 2025-08-24

#ML

This study creates a new dataset called ClaimGen-CN for generating legal claims in Chinese, based on real-world legal disputes. The researchers also develop a metric to evaluate the factuality and clarity of generated claims and test it on various language models, revealing areas for improvement in this field....

Steering When Necessary: Flexible Steering Large Language Models with Backtracking

Published at 2025-08-24

#ML

The researchers present a new method called Flexible Activation Steering with Backtracking (FASB) that improves large language models' performance by dynamically adjusting intervention strength based on both the question and generated content, and correcting deviations through a backtracking mechanism. Experiments show that FASB outperforms existing methods on various datasets....

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Published at 2025-08-24

#ML

The study presents TreePO, a new algorithm that improves the efficiency of policy optimization and inference by using a tree-structured approach for sequence generation. TreePO reduces computational costs and enhances exploration diversity, resulting in significant savings in GPU hours and compute resources for both trained and existing models, without compromising performance....

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Published at 2025-08-25

#ML

The authors present CMPhysBench, a benchmark for evaluating large language models in condensed matter physics, which includes over 520 calculation problems focusing on subfields like magnetism and superconductivity. They also introduce the SEED score to provide more accurate assessments of the models' similarity to ground-truth solutions, revealing a significant capability gap in these models for this domain....

DrugReasoner: Interpretable Drug Approval Prediction with a Reasoning-augmented Language Model

Published at 2025-08-25

#ML

DrugReasoner, a reasoning-based language model, is designed to predict small-molecule drug approval likelihood by integrating molecular descriptors and comparative reasoning. It outperforms traditional and recent AI models in predictive accuracy while providing step-by-step rationales and confidence scores, enhancing transparency in AI-assisted drug discovery....

ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models

Published at 2025-08-25

#ML

The authors present ObjFiller-3D, a method for creating consistent 3D objects by using a video editing model instead of traditional 2D image inpainting, leading to more accurate and detailed reconstructions with fewer visual artifacts....

Spacer: Towards Engineered Scientific Inspiration

Published at 2025-08-25

#ML

The authors present Spacer, a system designed to foster scientific discovery by generating original and fact-based concepts. Spacer uses a method called 'deliberate decontextualization' to create connections between keywords from academic publications, resulting in novel scientific ideas that are more similar to leading publications compared to other state-of-the-art language models....

Training Language Model Agents to Find Vulnerabilities with CTF-Dojo

Published at 2025-08-25

#ML

The authors present CTF-Dojo, a large-scale executable runtime for training language models with verifiable feedback, using 658 functional CTF-style challenges. They also introduce CTF-Forge, an automated pipeline for creating execution environments quickly. The trained language models show significant improvements over baseline models in competitive benchmarks, setting a new state-of-the-art....

Unraveling the cognitive patterns of Large Language Models through module communities

Published at 2025-08-25

#ML

This study compares the cognitive patterns of large language models (LLMs) to those of avian and small mammalian brains, revealing similarities in distributed yet interconnected cognitive organization. The researchers propose a new framework for analyzing foundation models that emphasizes the importance of dynamic, cross-regional interactions and neural plasticity in skill acquisition, providing new insights into LLM interpretability and fine-tuning strategies....

Wan-S2V: Audio-Driven Cinematic Video Generation

Published at 2025-08-25

#ML

The authors present a new model called Wan-S2V that improves upon existing methods for audio-driven character animation, particularly in complex cinematic contexts. They show that their model outperforms state-of-the-art models in various experiments and can be applied to long-form video generation and precise video lip-sync editing....

Autoregressive Universal Video Segmentation Model

Published at 2025-08-26

#ML

The authors propose a new model, AUSM, which can perform both prompted and unprompted video segmentation by predicting masks sequentially, similar to language modeling. This model is efficient, scalable, and outperforms other methods on various video segmentation benchmarks....

Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning

Published at 2025-08-26

#ML

The authors present SciReas, a collection of benchmarks for evaluating scientific reasoning in LLMs, and KRUX, a framework for studying the roles of reasoning and knowledge in these tasks. They find that retrieving relevant knowledge is crucial for LLMs in scientific reasoning, reasoning models benefit from external knowledge, and improving verbalized reasoning helps surface task-relevant knowledge....

FastMesh:Efficient Artistic Mesh Generation via Component Decoupling

Published at 2025-08-26

#ML

The authors present a new method for generating artistic meshes that reduces redundancy by separating vertex and face generation, resulting in faster and more efficient mesh creation compared to existing techniques. This approach uses an autoregressive model for vertex generation, a bidirectional transformer to complete the mesh, and additional refinements to improve quality, leading to over 8 times faster generation speed and higher mesh quality than state-of-the-art methods....

Forecasting Probability Distributions of Financial Returns with Deep Neural Networks

Published at 2025-08-26

#ML

The study uses 1D CNN and LSTM models to predict financial return distributions, outperforming traditional GARCH models in risk assessment and portfolio management, with LSTM and skewed Student's t distribution being the best performers....

MovieCORE: COgnitive REasoning in Movies

Published at 2025-08-26

#ML

The authors present MovieCORE, a new dataset for video question answering that focuses on complex cognitive understanding of movies, using a unique approach to generate and refine questions. They also propose a method to enhance video-language models' reasoning capabilities, improving performance by up to 25%....

OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation

Published at 2025-08-26

#ML

The OmniHuman-1.5 model is a new framework that creates realistic and emotionally expressive character animations by using multimodal large language models and a specialized architecture. This model can interpret audio, images, and text together to generate motions that match the character, scene, and context, outperforming existing models in various metrics and handling complex scenarios....

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Published at 2025-08-26

#ML

This study examines how sparsity in Mixture-of-Experts (MoE) models affects their performance in memorization and reasoning tasks. The researchers trained various MoE Transformers, adjusting parameters and sparsity levels, and found that reasoning abilities plateau and may even decline with increased sparsity, while memorization improves with more parameters....

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Published at 2025-08-26

#ML

The study presents ThinkDial, an open-source framework that allows for controlling reasoning effort in large language models by enabling users to switch between three reasoning modes: High, Medium, and Low, which offer varying levels of computational efficiency and performance. The framework achieves this through a unique training process that integrates budget-mode control and adaptive reward shaping, resulting in significant response length reductions without compromising performance....

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Published at 2025-08-26

#ML

The authors present UltraMemV2, a memory-layer architecture that matches the performance of 8-expert Mixture of Experts models with fewer memory accesses. They introduce five key improvements and demonstrate superior performance on memory-intensive tasks, validating their approach at scale with models up to 2.5B activated parameters from 120B total parameters....

VibeVoice Technical Report

Published at 2025-08-26

#ML

The report introduces VibeVoice, a model that creates long speeches with multiple speakers using a new method called next-token diffusion. This method, combined with a new continuous speech tokenizer, greatly improves efficiency and audio quality, allowing VibeVoice to outperform other dialogue models....

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Published at 2025-08-26

#ML

The authors present a new method called VoxHammer for editing 3D models without the need for training. This approach ensures precise and coherent edits by using a 3D latent space, and it has been shown to outperform existing methods in maintaining consistency and quality of the edited models....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages