🤗 Daily Paper(2025-09-18)

2 views

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 18, 2025, 4:07:24 PM (12 days ago) Sep 18

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework

Published at 2025-08-20

#ML

The authors create a medical research agent that overcomes challenges in the medical field by using a new data synthesis framework with medical knowledge graphs and integrating a specialized medical retrieval engine. This agent, called MedResearcher-R1, outperforms larger proprietary systems in medical benchmarks while maintaining competitiveness in general tasks....

Hybrid Quantum-Classical Model for Image Classification

Published at 2025-09-14

#ML

The study compares hybrid quantum-classical neural networks with classical models using CNNs on three datasets. Hybrid models showed better accuracy, faster training, and lower resource usage, especially for complex datasets, but had similar vulnerability to adversarial attacks on complex datasets....

Image Tokenizer Needs Post-Training

Published at 2025-09-15

#ML

This study analyzes the gap between image reconstruction and generation in existing models and proposes a new tokenizer training method that improves latent space construction and decoding. The proposed method enhances the tokenizer's robustness, leading to better generation quality and faster convergence, and introduces a new metric to evaluate tokenizer performance....

AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions

Published at 2025-09-16

#ML

The study presents AERIS, a high-resolution weather forecasting model using a new diffusion transformer, and SWiPe, a technique to efficiently distribute the workload. AERIS demonstrates high efficiency and outperforms existing models, showing promise for billion-parameter weather and climate prediction models....

LLM-I: LLMs are Naturally Interleaved Multimodal Creators

Published at 2025-09-16

#ML

The authors present a new framework called LLM-Interleaved (LLM-I) that enhances the capabilities of language models by integrating various visual tools like image search, generation, code execution, and editing. This framework enables the model to perform tasks requiring factual accuracy and precision, outperforming existing methods in benchmark tests and introducing a novel strategy for further performance improvements....

PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era

Published at 2025-09-16

#ML

This talk discusses the growing importance of omnidirectional vision, which uses 360-degree vision for better environmental understanding, in fields like robotics and industrial inspection. The talk introduces PANORAMA, a proposed system architecture for omnidirectional vision in the embodied AI era, and covers recent breakthroughs, trends, and future challenges in this area....

SteeringControl: Holistic Evaluation of Alignment Steering in LLMs

Published at 2025-09-16

#ML

SteeringControl is a benchmark that evaluates methods to control behavior in language models, focusing on bias, harmful content, and false information, while also considering lesser-explored trade-offs. The study reveals that effective behavior control depends on the combination of the steering method, model, and targeted behavior, and that poorly chosen combinations can lead to severe issues....

GenExam: A Multidisciplinary Text-to-Image Exam

Published at 2025-09-17

#ML

The study presents GenExam, a new evaluation tool for testing models' ability to integrate knowledge, reasoning, and generation in creating images. This tool features 1,000 exam-style prompts across 10 subjects, and even advanced models struggle to achieve high scores, indicating the challenge of this new benchmark....

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

Published at 2025-09-17

#ML

The authors created a set of Arabic-focused language models called Hala, which can understand and translate Arabic better than existing models. They used a special method to compress a strong Arabic-English translation model and then fine-tuned a smaller language model with the compressed one to generate high-quality Arabic instructions. The Hala models, with varying sizes, achieved top results in Arabic language tasks compared to other models of similar sizes....

Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Published at 2025-09-17

#ML

This study presents a new method called CARE that enhances the accuracy and reliability of large language models by improving their use of provided context. CARE teaches models to better integrate and utilize evidence within their reasoning process, requiring less labeled data and outperforming other methods in various question-answering benchmarks....

MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

Published at 2025-09-17

#ML

The MARS2 2025 Challenge on Multimodal Reasoning introduces two new datasets, Lens and AdsQA, to evaluate multimodal machine learning and large language models in real-world scenarios and specialized tasks. Researchers and institutions can access the datasets, baselines, and rankings on the MARS2 workshop website and GitHub page....

Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks

Published at 2025-09-17

#ML

This study combines quantum machine learning and Kolmogorov-Arnold networks by creating quantum variational activation functions, which significantly reduce parameter size while maintaining expressivity. The new methods, DARUAN and QKAN, improve parameter efficiency, generalization, and scalability, as demonstrated through theoretical analysis and experiments in various applications....

SAIL-VL2 Technical Report

Published at 2025-09-17

#ML

The authors present SAIL-VL2, a powerful open-source vision-language model that outperforms others in various image and video tasks. Its strengths come from improved data curation, a new training method, and a more efficient architecture, allowing it to excel in complex reasoning tasks and rank top among open-source models in its parameter range....

Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

Published at 2025-09-17

#ML

This study explores a method called machine unlearning to remove sensitive information from Code Language Models (CLMs) without fully retraining them. The proposed CodeEraser technique effectively erases sensitive memorized segments in code while preserving the overall integrity and functionality of the model, validated through experiments on three different CLMs....

Synthesizing Behaviorally-Grounded Reasoning Chains: A Data-Generation Framework for Personal Finance LLMs

Published at 2025-09-17

#ML

The study presents a new framework that combines financial context and behavioral finance to generate supervision data for personal finance advisors. This framework is used to create a large dataset and fine-tune a model, resulting in high performance at a lower cost compared to larger models....

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Published at 2025-09-17

#ML

The paper presents THOR, a new approach that uses reinforcement learning to improve mathematical reasoning in large language models by integrating external tools. THOR addresses challenges in constructing tool-integrated reasoning data, performing fine-grained optimization, and enhancing inference, resulting in improved performance on mathematical and coding benchmarks....

Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

Published at 2025-09-17

#ML

The study presents a new method called Wan-Animate, which can create lifelike animations of characters by mimicking their expressions and movements from a reference video, or replace a character in the video with another one while keeping the scene's lighting and colors consistent for a seamless look....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages