🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models |
Published at 2025-09-01 |
#ML
|
The authors present a method to improve the performance of large language models by incorporating causal knowledge into their attention mechanism, which helps the models focus on true causal relationships instead of spurious correlations, enhancing their accuracy and robustness in various scenarios.... |
Read More |
|
|
![]() |
Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts |
Published at 2025-09-01 |
#ML
|
This study explores how Large Language Models (LLMs) handle mixed contexts containing both relevant and inappropriate content. They develop a testbed and use a neuroscience model to show that LLMs often incorporate less prevalent information, making them vulnerable to small amounts of inappropriate content. To fix this, they introduce RW-Steering, a method that improves LLM safety and response quality in real-world settings without needing extensive supervision.... |
Read More |
|
|
|
![]() |
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies |
Published at 2025-09-05 |
#ML
|
Researchers have created an efficient Vision-Language-Action policy called FLOWER, which uses less computational power and resources compared to current methods. FLOWER performs well across various tasks and robotic embodiments, even setting a new state-of-the-art on a benchmark, and its code and pretrained weights are available for use.... |
Read More |
|
|
![]() |
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering |
Published at 2025-09-08 |
#ML
|
The paper presents HANRAG, a new framework that improves the Retrieval-Augmented Generation approach for answering complex questions. HANRAG efficiently handles multi-hop queries by routing, decomposing, and filtering noise from retrieved documents, outperforming other methods in both single-hop and multi-hop question-answering tasks.... |
Read More |
|
|
|
![]() |
IntrEx: A Dataset for Modeling Engagement in Educational Conversations |
Published at 2025-09-08 |
#ML
|
The researchers present IntrEx, a new dataset focused on understanding engagement in educational conversations between teachers and students. They used this dataset to train large language models, which outperformed more advanced proprietary models in predicting engagement, highlighting the importance of specialized datasets in modeling educational interactions.... |
Read More |
|
|
![]() |
VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions |
Published at 2025-09-09 |
#ML
|
The researchers present VStyle, a benchmark for evaluating the ability of spoken language models to change their speaking style based on spoken instructions, and introduce a new task called Voice Style Adaptation. They also develop an evaluation framework and find that current models struggle with this task, which can help advance human-centered spoken interaction.... |
Read More |
|
|
|
![]() |
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images |
Published at 2025-09-09 |
#ML
|
The authors created a new dataset called Visual-TableQA, which is designed to test and improve visual reasoning skills of AI models using complex table data. This dataset has over 2,500 tables and 6,000 questions, and it was generated using a cost-effective method involving multiple AI models working together. The dataset helps AI models perform better on various tasks, even though the data is synthetic.... |
Read More |
|
|
![]() |
Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation |
Published at 2025-09-10 |
#ML
|
The study examines the risks of using large language models (LLMs) for text annotation in social science research, revealing that researcher choices can introduce biases and errors, leading to incorrect conclusions. The study also finds that intentional manipulation of LLMs to produce statistically significant results is easy to do.... |
Read More |
|
|
|
![]() |
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools |
Published at 2025-09-10 |
#ML
|
The authors present MCP-AgentBench, a new benchmark for evaluating language agents in real-world scenarios using the Model Context Protocol (MCP). This benchmark includes a testbed with various tools, 600 queries of different complexities, and a new evaluation methodology to measure agent performance in completing real-world tasks effectively.... |
Read More |
|
|
![]() |
World Modeling with Probabilistic Structure Integration |
Published at 2025-09-10 |
#ML
|
The authors describe a system called PSI that learns flexible world models from data through a three-step process. This system can predict probabilistic graphs, extract low-dimensional properties, and integrate these structures back into the model, enhancing its capabilities and creating new control handles for tasks like video prediction and understanding.... |
Read More |
|
|
|
![]() |
X-Part: high fidelity and structure coherent shape decomposition |
Published at 2025-09-10 |
#ML
|
The study presents a new method, X-Part, which allows for the creation of detailed and structured 3D objects by decomposing them into meaningful parts. This approach improves controllability and semantic accuracy in 3D shape generation, enabling easy editing and high-quality 3D asset production.... |
Read More |
|
|
![]() |
DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning |
Published at 2025-09-11 |
#ML
|
This study explores two methods for predicting annotator-specific annotations: in-context learning with large language models and label distribution learning with RoBERTa. The research finds that in-context learning can effectively generate perspective-based annotations, and combining these predictions improves performance, while label distribution learning shows potential for soft label predictions and suggests further investigation by the perspectivist community.... |
Read More |
|
|
|
![]() |
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios |
Published at 2025-09-11 |
#ML
|
The study presents LoFT, a new framework for efficient fine-tuning in long-tailed semi-supervised learning, which improves upon existing methods by using foundation models to generate more reliable pseudo-labels. Additionally, the researchers introduce LoFT-OW to handle out-of-distribution samples in open-world scenarios, achieving better performance than previous approaches with only 1% of the unlabeled data.... |
Read More |
|
|
![]() |
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs |
Published at 2025-09-11 |
#ML
|
This study investigates why larger language models sometimes struggle with longer tasks despite high single-step accuracy. They find that self-conditioning, where models make more mistakes when their errors are in the context, is a major issue. To address this, they propose a new benchmark for testing models' ability to execute long-horizon tasks, which could help improve LLMs' performance on complex problems.... |
Read More |
|
|
|
![]() |
CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China |
Published at 2025-09-12 |
#ML
|
The study presents a new dataset called CMHG for generating headlines in minority languages of China, such as Tibetan, Uyghur, and Mongolian. This dataset contains 100,000 entries for Tibetan and 50,000 each for Uyghur and Mongolian, and it includes a high-quality test set annotated by native speakers to evaluate future research in this area.... |
Read More |
|
|
![]() |
Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation |
Published at 2025-09-12 |
#ML
|
The study presents a new method that uses a large language model to clarify ambiguous color terms in text prompts and refines text embeddings based on color relationships in the CIELAB space, improving color accuracy in text-to-image generation without needing extra training or reference images.... |
Read More |
|
|
|
![]() |
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis |
Published at 2025-09-12 |
#ML
|
The study presents InfGen, a new method for generating images at any resolution from a fixed-sized latent, which significantly reduces computation time for high-resolution images. By replacing the VAE decoder with a new generator, InfGen allows for faster and more efficient image synthesis without the need for retraining diffusion models.... |
Read More |
|
|
![]() |
Inpainting-Guided Policy Optimization for Diffusion Large Language Models |
Published at 2025-09-12 |
#ML
|
This study presents a new reinforcement learning framework, IGPO, for diffusion large language models that uses inpainting to guide exploration and improve efficiency. The proposed method inserts partial ground-truth reasoning traces during online sampling, which helps the model discover correct solutions faster and reduces sample waste, leading to significant performance improvements across various mathematical benchmarks.... |
Read More |
|
|
|
![]() |
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading |
Published at 2025-09-12 |
#ML
|
The study presents QuantAgent, a multi-agent LLM framework tailored for high-frequency trading, which outperforms other models in predicting market trends and earning returns over short periods by utilizing specialized agents that analyze various market aspects.... |
Read More |
|
|
![]() |
Virtual Agent Economies |
Published at 2025-09-12 |
#ML
|
The abstract discusses the emergence of a new economic layer driven by autonomous AI agents, proposing the 'sandbox economy' framework to analyze it. The authors highlight opportunities for coordination and challenges like economic risk and inequality, suggesting design choices for safe and fair AI agent markets, including auction mechanisms and 'mission economies'.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|