🤗 Daily Paper(2025-11-10)

2 views

Skip to first unread message

deep.di...@gmail.com

unread,

Nov 10, 2025, 3:07:15 PM (8 days ago) Nov 10

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

Published at 2025-10-28

#ML

This study explores using natural language critiques to improve the accuracy of Large Language Models' (LLMs) confidence assessments. The proposed methods, Self-Critique and CritiCal, outperform other baselines and even the teacher model, GPT-4o, in complex reasoning tasks, enhancing LLMs' reliability in various scenarios....

HAFixAgent: History-Aware Automated Program Repair Agent

Published at 2025-11-02

#ML

The paper proposes HAFixAgent, a history-aware bug-fixing agent that improves automated program repair by utilizing repository history, specifically for complex multi-hunk bugs. HAFixAgent significantly enhances effectiveness compared to state-of-the-art baselines, maintains efficiency, and offers a practical approach for integrating historical heuristics in agentic APR....

Jailbreaking in the Haystack

Published at 2025-11-04

#ML

The study presents NINJA, a method that exploits long-context language models by adding harmless, model-generated content to malicious user goals. NINJA significantly improves attack success rates on various models, including LLaMA, Qwen, Mistral, and Gemini, and is more efficient and harder to detect compared to previous methods....

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Published at 2025-11-06

#ML

The study presents a new benchmark to test Large Language Models' ability to role-play fictional characters with varying moral alignments. The results show that these models struggle to portray villainous characters authentically, especially those with traits like deceitfulness and manipulativeness, due to their safety alignment, which poses a challenge for creative generation....

VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks

Published at 2025-11-06

#ML

The paper presents a method called VeriCoT that checks the logic of multi-step reasoning by large language models (LLMs) to ensure accuracy and reliability, using formal logical arguments and first-order logic. VeriCoT improves reasoning validity and accuracy by identifying flawed reasoning, serving as a predictor of final answer correctness, and enabling inference-time self-reflection, supervised fine-tuning, and preference fine-tuning....

DeepEyesV2: Toward Agentic Multimodal Model

Published at 2025-11-07

#ML

The study presents DeepEyesV2, an agentic multimodal model that understands text and images and uses external tools like code execution and web search. The model is trained in two stages: first, to establish tool-use patterns, and second, to refine tool invocation. The researchers also introduce RealX-Bench, a benchmark to evaluate real-world multimodal reasoning, and demonstrate DeepEyesV2's effectiveness in various tasks, including perception, reasoning, and search....

Dense Motion Captioning

Published at 2025-11-07

#ML

The researchers created a new large-scale dataset called CompMo for 3D human motion sequences, which has detailed temporal annotations and complex motions. They also developed a model called DEMO that generates detailed, timed captions for these motion sequences, which performs better than existing methods on this and other benchmarks....

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings

Published at 2025-11-07

#ML

This study finds that large vision-language models have a bias towards language, which causes 'hallucinations' or false information. To fix this, they propose a method that combines visual and textual data more effectively, reducing hallucinations and improving accuracy on benchmark tests. (ELI5: Imagine a machine that misunderstands pictures because it focuses too much on words - this research shows a way to balance both, leading to fewer mistakes.)...

Visual Spatial Tuning

Published at 2025-11-07

#ML

The study presents a new method to improve spatial understanding in Vision-Language Models without affecting their general abilities. They create two large datasets, VST-P and VST-R, to enhance spatial perception and reasoning in models through a progressive training pipeline, resulting in state-of-the-art performance on various spatial benchmarks....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages