🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost |
Published at 2025-11-11 |
|
#ML
|
The researchers present ChaosEater, a system that uses Large Language Models to automate the entire process of Chaos Engineering, making it easier and cheaper for anyone to build resilient software systems, specifically those built on Kubernetes. They tested ChaosEater on small and large-scale Kubernetes systems and found that it completes reasonable Chaos Engineering cycles with significantly lower time and cost, and its cycles are validated by both human engineers and LLMs.... |
Read More |
|
|
|
![]() |
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models |
Published at 2025-11-11 |
|
#ML
|
This study presents a new method called Think-at-Hard to enhance the reasoning skills of large language models without increasing parameters. It focuses on hard tokens during the model's thinking process, improving performance across various benchmarks while maintaining efficiency.... |
Read More |
|
|
|
|
![]() |
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space |
Published at 2025-11-13 |
|
#ML
|
The authors present CoTyle, an open-source method for generating images with novel and consistent visual styles using only a numerical style code. This approach simplifies and diversifies style creation, offering a significant improvement over existing methods that rely on complex prompts or reference images.... |
Read More |
|
|
|
![]() |
Proactive Hearing Assistants that Isolate Egocentric Conversations |
Published at 2025-11-14 |
|
#ML
|
The authors present a new system of hearing assistants that can identify and separate the wearer's conversation partners in real-time, without needing any prompts. This system uses the wearer's own speech and conversation patterns to detect who they are talking to, and it runs on a device with low latency, providing a more personalized and adaptive hearing experience.... |
Read More |
|
|
|
|
![]() |
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models |
Published at 2025-11-14 |
|
#ML
|
The research presents TopoPerception, a new benchmark that uses topological properties to test the global visual perception of large vision-language models without relying on local shortcuts. The results show that even the most advanced models struggle with global visual perception, suggesting a need for new training methods or architectures to improve in this area.... |
Read More |
|
|
|
![]() |
Φeat: Physically-Grounded Feature Representation |
Published at 2025-11-14 |
|
#ML
|
The study presents a new visual backbone called Φeat that focuses on physical factors like material identity, reflectance, and geometric structure for improved feature representation in vision tasks. By using a self-supervised training strategy, Φeat learns to create robust features that are invariant to external physical factors, demonstrating the potential of unsupervised physical feature learning for physics-aware perception in vision and graphics.... |
Read More |
|
|
|
|
![]() |
Agent READMEs: An Empirical Study of Context Files for Agentic Coding |
Published at 2025-11-16 |
|
#ML
|
The study analyzes 2,303 agent context files from 1,925 repositories and finds that they are complex, evolving artifacts that developers mainly use for functional context. However, non-functional requirements like security and performance are rarely addressed, suggesting a need for better tools and practices.... |
Read More |
|
|
|
![]() |
A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition |
Published at 2025-11-17 |
|
#ML
|
The study presents a new model, RBTransformer, that uses Transformer-based neural networks to analyze brain activity through EEG signals for emotion recognition. This model considers the dynamic interactions between different brain regions, outperforming previous methods in various experiments across multiple datasets.... |
Read More |
|
|
|
|
![]() |
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark |
Published at 2025-11-17 |
|
#ML
|
The research proposes a new framework called Gen-ViRe that tests video models' ability to reason like a person using six cognitive dimensions. This framework helps understand the reasoning capabilities of video models better by revealing discrepancies between their visual quality and actual reasoning depth.... |
Read More |
|
|
|
![]() |
Error-Driven Scene Editing for 3D Grounding in Large Language Models |
Published at 2025-11-17 |
|
#ML
|
This study presents a novel method for improving 3D grounding in language models by creating precise visual counterfactuals through scene editing. The proposed DEER-3D framework identifies and corrects specific errors in the model, enhancing its spatial understanding and grounding accuracy without requiring extensive 3D data collection or scene reconstruction.... |
Read More |
|
|
|
|
![]() |
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework |
Published at 2025-11-17 |
|
#ML
|
This research explores how to use larger decoder-only models and visual information to improve the efficiency and performance of extreme multi-label classification (XMC). The proposed Vision-enhanced eXtreme Multi-label Learning framework (ViXML) integrates foundation vision models to unlock multi-modal capabilities while limiting computational growth, outperforming text-only decoder models in most cases.... |
Read More |
|
|
|
![]() |
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding |
Published at 2025-11-17 |
|
#ML
|
The study presents REVISOR, a new framework that allows MLLMs to think and reason across both text and video, improving their understanding of long videos. This framework works without needing extra training or external tools, and it's been tested and shown to work well on several video understanding benchmarks.... |
Read More |
|
|
|
|
![]() |
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning |
Published at 2025-11-18 |
|
#ML
|
The authors present ATLAS, a new large-scale, high-difficulty, and cross-disciplinary evaluation suite designed to test the advanced scientific reasoning capabilities of large language models. ATLAS, which spans seven scientific fields, features original and contamination-resistant problems, prioritizes complex open-ended answers, and undergoes rigorous quality control to ensure scientific value and correctness.... |
Read More |
|
|
|
![]() |
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning |
Published at 2025-11-18 |
|
#ML
|
The study focuses on improving the training of large language model agents using reinforcement learning. It extends the Markov Decision Process framework to define key components of these agents and introduces Agent-R1, a flexible and user-friendly training framework, which was validated through experiments on Multihop QA benchmark tasks.... |
Read More |
|
|
|
|
![]() |
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models |
Published at 2025-11-18 |
|
#ML
|
AraLingBench is a new test for measuring the Arabic language skills of large language models. The test reveals that while models can do well on surface-level tasks, they struggle with deeper language understanding, suggesting that they often rely on memorization rather than true comprehension.... |
Read More |
|
|
|
![]() |
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs |
Published at 2025-11-18 |
|
#ML
|
Researchers have created MVI-Bench, a new tool to test how well large vision-language models handle misleading images, which is important for making these models more reliable and responsible. They built MVI-Bench using a system of categories and curated questions, and introduced a new way to measure a model's performance in this area. Testing various top models on MVI-Bench showed that they often struggle with misleading visual inputs, providing insights for improving these models.... |
Read More |
|
|
|
|
![]() |
Mitigating Label Length Bias in Large Language Models |
Published at 2025-11-18 |
|
#ML
|
This study addresses a problem called label length bias in large language models, where multi-token class labels cause inconsistent treatment of labels of different lengths. The researchers propose a solution called normalized contextual calibration, which normalizes and calibrates predictions at the full-label level, resulting in significant improvement over prior approaches and reducing sensitivity to few-shot example selection.... |
Read More |
|
|
|
![]() |
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models |
Published at 2025-11-18 |
|
#ML
|
The OmniZip framework efficiently handles the computational challenge of processing audio-video data in large language models by selectively compressing and retaining important audio and video tokens, resulting in faster processing and reduced memory usage without compromising performance.... |
Read More |
|
|
|
|
![]() |
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution |
Published at 2025-11-18 |
|
#ML
|
Orion is a new visual agent framework that can handle any type of input and output, and is designed for advanced visual AI tasks. It uses a variety of specialized computer vision tools to perform complex visual workflows, and has achieved competitive performance on several benchmarks, making it suitable for production-grade visual intelligence.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|