🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
Behavioral Fingerprinting of Large Language Models |
Published at 2025-09-02 |
#ML
|
This study presents a new method to evaluate large language models by analyzing their unique behavioral traits, rather than just performance metrics. The researchers found that while top models have similar core capabilities, their interactive behaviors vary greatly, likely due to different developer alignment strategies.... |
Read More |
|
|
![]() |
U-ARM : Ultra low-cost general teleoperation interface for robot manipulation |
Published at 2025-09-02 |
#ML
|
The authors present U-Arm, a budget-friendly and adaptable teleoperation system for commercial robotic arms. U-Arm uses 3D-printed leader arms with consistent control logic, allowing compatibility with various robotic configurations. It has a low bill of materials cost and improved data collection efficiency compared to other low-cost teleoperation interfaces.... |
Read More |
|
|
|
![]() |
LuxDiT: Lighting Estimation with Video Diffusion Transformer |
Published at 2025-09-03 |
#ML
|
The authors present a new method called LuxDiT that uses a video diffusion transformer to estimate scene lighting from visual input, improving upon existing techniques by learning from a large synthetic dataset and employing a low-rank adaptation fine-tuning strategy for better alignment between input and output.... |
Read More |
|
|
![]() |
MedVista3D: Vision-Language Modeling for Reducing Diagnostic Errors in 3D CT Disease Detection, Understanding and Reporting |
Published at 2025-09-03 |
#ML
|
MedVista3D is a new system that uses both vision and language to help reduce diagnostic errors in analyzing 3D CT scans. It does this by improving the accuracy of disease detection, making it easier to understand the entire scan, and providing more consistent and clear language in medical reports.... |
Read More |
|
|
|
![]() |
Bootstrapping Task Spaces for Self-Improvement |
Published at 2025-09-04 |
#ML
|
The authors propose Exploratory Iteration (ExIt), a new method for training language models to improve their performance over time by selecting the most informative steps during an episode and using them to train a self-improvement policy. ExIt can enhance models' abilities in various domains, such as math, tool-use, and machine learning engineering, by enabling them to iterate and improve their performance beyond the training data.... |
Read More |
|
|
![]() |
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs |
Published at 2025-09-04 |
#ML
|
This study tests how well large language models (LLMs) work with questions worded differently and found that while the models' rankings stayed the same, their scores dropped significantly, suggesting they struggle with varied language and calling into question the reliability of existing evaluation methods.... |
Read More |
|
|
|
![]() |
Set Block Decoding is a Language Model Inference Accelerator |
Published at 2025-09-04 |
#ML
|
This study presents a new method called Set Block Decoding (SBD) that improves the speed of language model inference by allowing the model to predict multiple future tokens in parallel, without making any changes to the model's architecture or requiring additional training. The method has been shown to reduce the number of forward passes needed for generation by 3-5 times, while maintaining the same level of accuracy as traditional methods.... |
Read More |
|
|
![]() |
Why Language Models Hallucinate |
Published at 2025-09-04 |
#ML
|
Large language models sometimes make up incorrect information, called 'hallucinations', because they are trained to guess instead of admitting uncertainty. This issue is made worse by evaluation methods that reward correct answers over uncertainty, but it can be fixed by changing how these models are scored and evaluated.... |
Read More |
|
|
|
![]() |
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning |
Published at 2025-09-04 |
#ML
|
The study presents WildScore, a new benchmark to test the reasoning skills of Multimodal Large Language Models in interpreting real-world music scores. WildScore uses actual musical compositions and questions to evaluate the models' understanding of symbolic music, revealing both strengths and areas for improvement in their visual-symbolic reasoning abilities.... |
Read More |
|
|
![]() |
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation |
Published at 2025-09-05 |
#ML
|
LatticeWorld is a new framework that uses lightweight language models and high-quality rendering engines to quickly generate large-scale, interactive 3D worlds based on text and image inputs. It significantly improves efficiency in creating 3D environments compared to traditional methods while maintaining high quality.... |
Read More |
|
|
|
![]() |
Symbolic Graphics Programming with Large Language Models |
Published at 2025-09-05 |
#ML
|
The study explores using large language models to generate scalable vector graphics (SVGs) from natural language descriptions, introducing a benchmark called SGP-GenBench to evaluate this capability. The research proposes a reinforcement learning approach to improve SVG generation quality and semantics, achieving performance comparable to leading systems and demonstrating finer object decomposition and improved scene coherence.... |
Read More |
|
|
![]() |
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool |
Published at 2025-09-05 |
#ML
|
The authors propose a new model called WinT3R that can accurately predict camera positions and create high-quality point maps in real-time. Unlike previous methods, WinT3R uses a sliding window technique to improve prediction quality without extra computation and a global camera token pool to enhance pose estimation reliability, all while maintaining efficiency.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|