🤗 Daily Paper(2025-09-15)

1 view

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 15, 2025, 4:07:19 PMSep 15

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models

Published at 2025-09-01

#ML

The authors present a method to improve the performance of large language models by incorporating causal knowledge into their attention mechanism, which helps the models focus on true causal relationships instead of spurious correlations, enhancing their accuracy and robustness in various scenarios....

Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts

Published at 2025-09-01

#ML

This study explores how Large Language Models (LLMs) handle mixed contexts containing both relevant and inappropriate content. They develop a testbed and use a neuroscience model to show that LLMs often incorporate less prevalent information, making them vulnerable to small amounts of inappropriate content. To fix this, they introduce RW-Steering, a method that improves LLM safety and response quality in real-world settings without needing extensive supervision....

FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies

Published at 2025-09-05

#ML

Researchers have created an efficient Vision-Language-Action policy called FLOWER, which uses less computational power and resources compared to current methods. FLOWER performs well across various tasks and robotic embodiments, even setting a new state-of-the-art on a benchmark, and its code and pretrained weights are available for use....

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

Published at 2025-09-08

#ML

The paper presents HANRAG, a new framework that improves the Retrieval-Augmented Generation approach for answering complex questions. HANRAG efficiently handles multi-hop queries by routing, decomposing, and filtering noise from retrieved documents, outperforming other methods in both single-hop and multi-hop question-answering tasks....

IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Published at 2025-09-08

#ML

The researchers present IntrEx, a new dataset focused on understanding engagement in educational conversations between teachers and students. They used this dataset to train large language models, which outperformed more advanced proprietary models in predicting engagement, highlighting the importance of specialized datasets in modeling educational interactions....

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

Published at 2025-09-09

#ML

The researchers present VStyle, a benchmark for evaluating the ability of spoken language models to change their speaking style based on spoken instructions, and introduce a new task called Voice Style Adaptation. They also develop an evaluation framework and find that current models struggle with this task, which can help advance human-centered spoken interaction....

Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

Published at 2025-09-09

#ML

The authors created a new dataset called Visual-TableQA, which is designed to test and improve visual reasoning skills of AI models using complex table data. This dataset has over 2,500 tables and 6,000 questions, and it was generated using a cost-effective method involving multiple AI models working together. The dataset helps AI models perform better on various tasks, even though the data is synthetic....

Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation

Published at 2025-09-10

#ML

The study examines the risks of using large language models (LLMs) for text annotation in social science research, revealing that researcher choices can introduce biases and errors, leading to incorrect conclusions. The study also finds that intentional manipulation of LLMs to produce statistically significant results is easy to do....

MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools

Published at 2025-09-10

#ML

The authors present MCP-AgentBench, a new benchmark for evaluating language agents in real-world scenarios using the Model Context Protocol (MCP). This benchmark includes a testbed with various tools, 600 queries of different complexities, and a new evaluation methodology to measure agent performance in completing real-world tasks effectively....

World Modeling with Probabilistic Structure Integration

Published at 2025-09-10

#ML

The authors describe a system called PSI that learns flexible world models from data through a three-step process. This system can predict probabilistic graphs, extract low-dimensional properties, and integrate these structures back into the model, enhancing its capabilities and creating new control handles for tasks like video prediction and understanding....

X-Part: high fidelity and structure coherent shape decomposition

Published at 2025-09-10

#ML

The study presents a new method, X-Part, which allows for the creation of detailed and structured 3D objects by decomposing them into meaningful parts. This approach improves controllability and semantic accuracy in 3D shape generation, enabling easy editing and high-quality 3D asset production....

DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning

Published at 2025-09-11

#ML

This study explores two methods for predicting annotator-specific annotations: in-context learning with large language models and label distribution learning with RoBERTa. The research finds that in-context learning can effectively generate perspective-based annotations, and combining these predictions improves performance, while label distribution learning shows potential for soft label predictions and suggests further investigation by the perspectivist community....

LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios

Published at 2025-09-11

#ML

The study presents LoFT, a new framework for efficient fine-tuning in long-tailed semi-supervised learning, which improves upon existing methods by using foundation models to generate more reliable pseudo-labels. Additionally, the researchers introduce LoFT-OW to handle out-of-distribution samples in open-world scenarios, achieving better performance than previous approaches with only 1% of the unlabeled data....

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Published at 2025-09-11

#ML

This study investigates why larger language models sometimes struggle with longer tasks despite high single-step accuracy. They find that self-conditioning, where models make more mistakes when their errors are in the context, is a major issue. To address this, they propose a new benchmark for testing models' ability to execute long-horizon tasks, which could help improve LLMs' performance on complex problems....

CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China

Published at 2025-09-12

#ML

The study presents a new dataset called CMHG for generating headlines in minority languages of China, such as Tibetan, Uyghur, and Mongolian. This dataset contains 100,000 entries for Tibetan and 50,000 each for Uyghur and Mongolian, and it includes a high-quality test set annotated by native speakers to evaluate future research in this area....

Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation

Published at 2025-09-12

#ML

The study presents a new method that uses a large language model to clarify ambiguous color terms in text prompts and refines text embeddings based on color relationships in the CIELAB space, improving color accuracy in text-to-image generation without needing extra training or reference images....

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Published at 2025-09-12

#ML

The study presents InfGen, a new method for generating images at any resolution from a fixed-sized latent, which significantly reduces computation time for high-resolution images. By replacing the VAE decoder with a new generator, InfGen allows for faster and more efficient image synthesis without the need for retraining diffusion models....

Inpainting-Guided Policy Optimization for Diffusion Large Language Models

Published at 2025-09-12

#ML

This study presents a new reinforcement learning framework, IGPO, for diffusion large language models that uses inpainting to guide exploration and improve efficiency. The proposed method inserts partial ground-truth reasoning traces during online sampling, which helps the model discover correct solutions faster and reduces sample waste, leading to significant performance improvements across various mathematical benchmarks....

QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

Published at 2025-09-12

#ML

The study presents QuantAgent, a multi-agent LLM framework tailored for high-frequency trading, which outperforms other models in predicting market trends and earning returns over short periods by utilizing specialized agents that analyze various market aspects....

Virtual Agent Economies

Published at 2025-09-12

#ML

The abstract discusses the emergence of a new economic layer driven by autonomous AI agents, proposing the 'sandbox economy' framework to analyze it. The authors highlight opportunities for coordination and challenges like economic risk and inequality, suggesting design choices for safe and fair AI agent markets, including auction mechanisms and 'mission economies'....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages