🤗 Daily Paper(2025-08-21)

5 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 21, 2025, 4:07:20 PMAug 21

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

mSCoRe: a Multilingual and Scalable Benchmark for Skill-based Commonsense Reasoning

Published at 2025-08-13

#ML

The study presents a new benchmark called mSCoRe to evaluate the reasoning skills of large language models across different languages and cultures. This benchmark focuses on analyzing the models' ability to use various reasoning skills and adapts to the models' improvement, revealing limitations in current models when dealing with complex multilingual and cultural commonsense....

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Published at 2025-08-15

#ML

This study explores a new way to combine two methods (Supervised Fine-Tuning and Reinforcement Learning) used to improve Large Language Models. The proposed method, CHORD, dynamically balances these two methods to prevent disruption and overfitting, resulting in improved model performance and a more stable learning process....

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Published at 2025-08-16

#ML

The authors have created FutureX, a large and diverse live benchmark for evaluating large language models (LLMs) in future prediction tasks, which involves analyzing, understanding, and making decisions based on real-time data. They tested 25 LLM models, revealing their strengths and weaknesses in adapting to new information and avoiding misinformation, aiming to improve the performance of LLMs to match human experts in complex reasoning and predictive thinking....

FLARE: Fast Low-rank Attention Routing Engine

Published at 2025-08-17

#ML

The authors present a new method called FLARE that improves the efficiency of self-attention mechanisms, enabling them to handle larger datasets while maintaining high accuracy. This is achieved by projecting input sequences onto a fixed-length latent sequence, which allows for a low-rank form of attention that can be applied more quickly. Additionally, the authors introduce a new dataset for additive manufacturing research and provide their code for others to use....

From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

Published at 2025-08-18

#ML

The study surveys the development of AI in scientific discovery, focusing on Agentic Science where AI systems gain full scientific autonomy through capabilities like hypothesis generation and experimental design. It reviews applications across various domains and proposes a framework to unify different perspectives, while also identifying challenges and future opportunities in AI-driven research....

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

Published at 2025-08-18

#ML

The authors present FinCDM, a new framework for evaluating financial Large Language Models (LLMs) that goes beyond score-level evaluation to assess LLMs' knowledge and skills in a more nuanced way. They also introduce CPA-QKA, a dataset for financial LLM evaluation that covers a broad range of real-world accounting and financial skills, and demonstrate FinCDM's ability to reveal hidden knowledge gaps and behavioral patterns among various LLMs....

Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Published at 2025-08-19

#ML

This study presents a method named Deep Equilibrium Canonicalizer (DEC) to enhance a model's ability to handle changes in object size within images, which is a common issue in computer vision. The proposed DEC can be easily added to existing models and has been shown to improve performance and consistency on the ImageNet benchmark for four popular pre-trained models....

Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation

Published at 2025-08-19

#ML

The study presents a new framework called REARM to improve multi-modal recommendation systems by addressing limitations in current methods. REARM enhances contrastive learning using meta-network and orthogonal constraint strategies, and integrates user interest and item co-occurrence graphs for better homography relations, resulting in superior performance compared to state-of-the-art baselines....

RynnEC: Bringing MLLMs into Embodied World

Published at 2025-08-19

#ML

RynnEC is a new video model that helps AI understand and interact with the physical world by focusing on specific regions in videos. It can do tasks like object segmentation and spatial reasoning better than other models, and the creators made a new benchmark to test its abilities. They also proposed a method to generate more training data for embodied cognition models....

ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions?

Published at 2025-08-19

#ML

This study tests the performance of vision language models (VLMs) on Vietnamese multimodal exam questions, comparing their results to human test-takers. The researchers find that while some VLMs perform better than the average human, they still lag behind the best human performers, and methods like cross-lingual prompting or human-in-the-loop collaboration only provide partial improvements....

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Published at 2025-08-20

#ML

The authors propose a new framework called DuPO that allows language models to learn and improve without needing human-created labels. DuPO can be applied to various tasks, including language translation and math reasoning, and has shown significant improvements in performance across these tasks....

Leuvenshtein: Efficient FHE-based Edit Distance Computation with Single Bootstrap per Cell

Published at 2025-08-20

#ML

The authors propose an improved method for calculating the Levenshtein distance using Fully Homomorphic Encryption, which significantly reduces computational costs and increases speed compared to existing algorithms, particularly in applications like finance and genomics....

MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Published at 2025-08-20

#ML

The paper presents MCP-Universe, a comprehensive benchmark for evaluating large language models in realistic tasks using real-world Model Context Protocol servers, covering six domains across 11 servers. The benchmark reveals performance limitations in SOTA models and introduces challenges like long-context and unknown-tools, while also providing an open-source evaluation framework for further research and innovation in the MCP ecosystem....

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Published at 2025-08-20

#ML

The study presents MeshCoder, a new system that converts complex 3D objects from point clouds into editable programs using a large language model. This tool allows for better reconstruction of shapes and easier editing of 3D objects by using a large dataset and a powerful language model, making it a versatile solution for 3D shape understanding....

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Published at 2025-08-20

#ML

Researchers developed a new language model, Nemotron-Nano-9B-v2, that is both accurate and fast for reasoning tasks. This model, which is smaller than others with similar capabilities, can run inferences on a single GPU and performs up to six times faster than similarly-sized models....

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Published at 2025-08-20

#ML

This study explores compressing Diffusion LLMs, which are large language models that require a lot of resources, making them hard to use on devices like phones or laptops. The researchers found a problem with these models where some values are very large, making it hard to reduce the model size without losing important information. They then applied different methods to fix this issue and tested how well each method worked across various tasks and model types, providing useful insights for futur...

Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

Published at 2025-08-20

#ML

The authors present a new method called Tinker that allows for high-quality 3D editing using just one or two images, without needing to optimize for each specific scene. Tinker uses pre-trained models to understand the 3D structure of images and can make edits that look consistent from different viewpoints, making it easier to create 3D content....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages