🤗 Daily Paper(2025-11-19)

2 views
Skip to first unread message

deep.di...@gmail.com

unread,
Nov 19, 2025, 3:07:18 PM (2 days ago) Nov 19
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost

Published at 2025-11-11

#ML

The researchers present ChaosEater, a system that uses Large Language Models to automate the entire process of Chaos Engineering, making it easier and cheaper for anyone to build resilient software systems, specifically those built on Kubernetes. They tested ChaosEater on small and large-scale Kubernetes systems and found that it completes reasonable Chaos Engineering cycles with significantly lower time and cost, and its cycles are validated by both human engineers and LLMs....

Read Moreicon

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Published at 2025-11-11

#ML

This study presents a new method called Think-at-Hard to enhance the reasoning skills of large language models without increasing parameters. It focuses on hard tokens during the model's thinking process, improving performance across various benchmarks while maintaining efficiency....

Read Moreicon

A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Published at 2025-11-13

#ML

The authors present CoTyle, an open-source method for generating images with novel and consistent visual styles using only a numerical style code. This approach simplifies and diversifies style creation, offering a significant improvement over existing methods that rely on complex prompts or reference images....

Read Moreicon

Proactive Hearing Assistants that Isolate Egocentric Conversations

Published at 2025-11-14

#ML

The authors present a new system of hearing assistants that can identify and separate the wearer's conversation partners in real-time, without needing any prompts. This system uses the wearer's own speech and conversation patterns to detect who they are talking to, and it runs on a device with low latency, providing a more personalized and adaptive hearing experience....

Read Moreicon

TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models

Published at 2025-11-14

#ML

The research presents TopoPerception, a new benchmark that uses topological properties to test the global visual perception of large vision-language models without relying on local shortcuts. The results show that even the most advanced models struggle with global visual perception, suggesting a need for new training methods or architectures to improve in this area....

Read Moreicon

Φeat: Physically-Grounded Feature Representation

Published at 2025-11-14

#ML

The study presents a new visual backbone called Φeat that focuses on physical factors like material identity, reflectance, and geometric structure for improved feature representation in vision tasks. By using a self-supervised training strategy, Φeat learns to create robust features that are invariant to external physical factors, demonstrating the potential of unsupervised physical feature learning for physics-aware perception in vision and graphics....

Read Moreicon

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Published at 2025-11-16

#ML

The study analyzes 2,303 agent context files from 1,925 repositories and finds that they are complex, evolving artifacts that developers mainly use for functional context. However, non-functional requirements like security and performance are rarely addressed, suggesting a need for better tools and practices....

Read Moreicon

A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition

Published at 2025-11-17

#ML

The study presents a new model, RBTransformer, that uses Transformer-based neural networks to analyze brain activity through EEG signals for emotion recognition. This model considers the dynamic interactions between different brain regions, outperforming previous methods in various experiments across multiple datasets....

Read Moreicon

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Published at 2025-11-17

#ML

The research proposes a new framework called Gen-ViRe that tests video models' ability to reason like a person using six cognitive dimensions. This framework helps understand the reasoning capabilities of video models better by revealing discrepancies between their visual quality and actual reasoning depth....

Read Moreicon

Error-Driven Scene Editing for 3D Grounding in Large Language Models

Published at 2025-11-17

#ML

This study presents a novel method for improving 3D grounding in language models by creating precise visual counterfactuals through scene editing. The proposed DEER-3D framework identifies and corrects specific errors in the model, enhancing its spatial understanding and grounding accuracy without requiring extensive 3D data collection or scene reconstruction....

Read Moreicon

Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework

Published at 2025-11-17

#ML

This research explores how to use larger decoder-only models and visual information to improve the efficiency and performance of extreme multi-label classification (XMC). The proposed Vision-enhanced eXtreme Multi-label Learning framework (ViXML) integrates foundation vision models to unlock multi-modal capabilities while limiting computational growth, outperforming text-only decoder models in most cases....

Read Moreicon

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Published at 2025-11-17

#ML

The study presents REVISOR, a new framework that allows MLLMs to think and reason across both text and video, improving their understanding of long videos. This framework works without needing extra training or external tools, and it's been tested and shown to work well on several video understanding benchmarks....

Read Moreicon

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Published at 2025-11-18

#ML

The authors present ATLAS, a new large-scale, high-difficulty, and cross-disciplinary evaluation suite designed to test the advanced scientific reasoning capabilities of large language models. ATLAS, which spans seven scientific fields, features original and contamination-resistant problems, prioritizes complex open-ended answers, and undergoes rigorous quality control to ensure scientific value and correctness....

Read Moreicon

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Published at 2025-11-18

#ML

The study focuses on improving the training of large language model agents using reinforcement learning. It extends the Markov Decision Process framework to define key components of these agents and introduces Agent-R1, a flexible and user-friendly training framework, which was validated through experiments on Multihop QA benchmark tasks....

Read Moreicon

AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

Published at 2025-11-18

#ML

AraLingBench is a new test for measuring the Arabic language skills of large language models. The test reveals that while models can do well on surface-level tasks, they struggle with deeper language understanding, suggesting that they often rely on memorization rather than true comprehension....

Read Moreicon

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

Published at 2025-11-18

#ML

Researchers have created MVI-Bench, a new tool to test how well large vision-language models handle misleading images, which is important for making these models more reliable and responsible. They built MVI-Bench using a system of categories and curated questions, and introduced a new way to measure a model's performance in this area. Testing various top models on MVI-Bench showed that they often struggle with misleading visual inputs, providing insights for improving these models....

Read Moreicon

Mitigating Label Length Bias in Large Language Models

Published at 2025-11-18

#ML

This study addresses a problem called label length bias in large language models, where multi-token class labels cause inconsistent treatment of labels of different lengths. The researchers propose a solution called normalized contextual calibration, which normalizes and calibrates predictions at the full-label level, resulting in significant improvement over prior approaches and reducing sensitivity to few-shot example selection....

Read Moreicon

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Published at 2025-11-18

#ML

The OmniZip framework efficiently handles the computational challenge of processing audio-video data in large language models by selectively compressing and retaining important audio and video tokens, resulting in faster processing and reduced memory usage without compromising performance....

Read Moreicon

Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

Published at 2025-11-18

#ML

Orion is a new visual agent framework that can handle any type of input and output, and is designed for advanced visual AI tasks. It uses a variety of specialized computer vision tools to perform complex visual workflows, and has achieved competitive performance on several benchmarks, making it suitable for production-grade visual intelligence....

Read Moreicon

Published at

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages