🤗 Daily Paper(2025-09-08)

0 views
Skip to first unread message

deep.di...@gmail.com

unread,
Sep 8, 2025, 4:06:57 PMSep 8
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

Behavioral Fingerprinting of Large Language Models

Published at 2025-09-02

#ML

This study presents a new method to evaluate large language models by analyzing their unique behavioral traits, rather than just performance metrics. The researchers found that while top models have similar core capabilities, their interactive behaviors vary greatly, likely due to different developer alignment strategies....

Read Moreicon

U-ARM : Ultra low-cost general teleoperation interface for robot manipulation

Published at 2025-09-02

#ML

The authors present U-Arm, a budget-friendly and adaptable teleoperation system for commercial robotic arms. U-Arm uses 3D-printed leader arms with consistent control logic, allowing compatibility with various robotic configurations. It has a low bill of materials cost and improved data collection efficiency compared to other low-cost teleoperation interfaces....

Read Moreicon

LuxDiT: Lighting Estimation with Video Diffusion Transformer

Published at 2025-09-03

#ML

The authors present a new method called LuxDiT that uses a video diffusion transformer to estimate scene lighting from visual input, improving upon existing techniques by learning from a large synthetic dataset and employing a low-rank adaptation fine-tuning strategy for better alignment between input and output....

Read Moreicon

MedVista3D: Vision-Language Modeling for Reducing Diagnostic Errors in 3D CT Disease Detection, Understanding and Reporting

Published at 2025-09-03

#ML

MedVista3D is a new system that uses both vision and language to help reduce diagnostic errors in analyzing 3D CT scans. It does this by improving the accuracy of disease detection, making it easier to understand the entire scan, and providing more consistent and clear language in medical reports....

Read Moreicon

Bootstrapping Task Spaces for Self-Improvement

Published at 2025-09-04

#ML

The authors propose Exploratory Iteration (ExIt), a new method for training language models to improve their performance over time by selecting the most informative steps during an episode and using them to train a self-improvement policy. ExIt can enhance models' abilities in various domains, such as math, tool-use, and machine learning engineering, by enabling them to iterate and improve their performance beyond the training data....

Read Moreicon

On Robustness and Reliability of Benchmark-Based Evaluation of LLMs

Published at 2025-09-04

#ML

This study tests how well large language models (LLMs) work with questions worded differently and found that while the models' rankings stayed the same, their scores dropped significantly, suggesting they struggle with varied language and calling into question the reliability of existing evaluation methods....

Read Moreicon

Set Block Decoding is a Language Model Inference Accelerator

Published at 2025-09-04

#ML

This study presents a new method called Set Block Decoding (SBD) that improves the speed of language model inference by allowing the model to predict multiple future tokens in parallel, without making any changes to the model's architecture or requiring additional training. The method has been shown to reduce the number of forward passes needed for generation by 3-5 times, while maintaining the same level of accuracy as traditional methods....

Read Moreicon

Why Language Models Hallucinate

Published at 2025-09-04

#ML

Large language models sometimes make up incorrect information, called 'hallucinations', because they are trained to guess instead of admitting uncertainty. This issue is made worse by evaluation methods that reward correct answers over uncertainty, but it can be fixed by changing how these models are scored and evaluated....

Read Moreicon

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning

Published at 2025-09-04

#ML

The study presents WildScore, a new benchmark to test the reasoning skills of Multimodal Large Language Models in interpreting real-world music scores. WildScore uses actual musical compositions and questions to evaluate the models' understanding of symbolic music, revealing both strengths and areas for improvement in their visual-symbolic reasoning abilities....

Read Moreicon

LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Published at 2025-09-05

#ML

LatticeWorld is a new framework that uses lightweight language models and high-quality rendering engines to quickly generate large-scale, interactive 3D worlds based on text and image inputs. It significantly improves efficiency in creating 3D environments compared to traditional methods while maintaining high quality....

Read Moreicon

Symbolic Graphics Programming with Large Language Models

Published at 2025-09-05

#ML

The study explores using large language models to generate scalable vector graphics (SVGs) from natural language descriptions, introducing a benchmark called SGP-GenBench to evaluate this capability. The research proposes a reinforcement learning approach to improve SVG generation quality and semantics, achieving performance comparable to leading systems and demonstrating finer object decomposition and improved scene coherence....

Read Moreicon

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

Published at 2025-09-05

#ML

The authors propose a new model called WinT3R that can accurately predict camera positions and create high-quality point maps in real-time. Unlike previous methods, WinT3R uses a sliding window technique to improve prediction quality without extra computation and a global camera token pool to enhance pose estimation reliability, all while maintaining efficiency....

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages