🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? |
Published at 2025-10-28 |
|
#ML
|
This study explores using natural language critiques to improve the accuracy of Large Language Models' (LLMs) confidence assessments. The proposed methods, Self-Critique and CritiCal, outperform other baselines and even the teacher model, GPT-4o, in complex reasoning tasks, enhancing LLMs' reliability in various scenarios.... |
Read More |
|
|
|
![]() |
HAFixAgent: History-Aware Automated Program Repair Agent |
Published at 2025-11-02 |
|
#ML
|
The paper proposes HAFixAgent, a history-aware bug-fixing agent that improves automated program repair by utilizing repository history, specifically for complex multi-hunk bugs. HAFixAgent significantly enhances effectiveness compared to state-of-the-art baselines, maintains efficiency, and offers a practical approach for integrating historical heuristics in agentic APR.... |
Read More |
|
|
|
|
![]() |
Jailbreaking in the Haystack |
Published at 2025-11-04 |
|
#ML
|
The study presents NINJA, a method that exploits long-context language models by adding harmless, model-generated content to malicious user goals. NINJA significantly improves attack success rates on various models, including LLaMA, Qwen, Mistral, and Gemini, and is more efficient and harder to detect compared to previous methods.... |
Read More |
|
|
|
![]() |
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains |
Published at 2025-11-06 |
|
#ML
|
The study presents a new benchmark to test Large Language Models' ability to role-play fictional characters with varying moral alignments. The results show that these models struggle to portray villainous characters authentically, especially those with traits like deceitfulness and manipulativeness, due to their safety alignment, which poses a challenge for creative generation.... |
Read More |
|
|
|
|
![]() |
VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks |
Published at 2025-11-06 |
|
#ML
|
The paper presents a method called VeriCoT that checks the logic of multi-step reasoning by large language models (LLMs) to ensure accuracy and reliability, using formal logical arguments and first-order logic. VeriCoT improves reasoning validity and accuracy by identifying flawed reasoning, serving as a predictor of final answer correctness, and enabling inference-time self-reflection, supervised fine-tuning, and preference fine-tuning.... |
Read More |
|
|
|
![]() |
DeepEyesV2: Toward Agentic Multimodal Model |
Published at 2025-11-07 |
|
#ML
|
The study presents DeepEyesV2, an agentic multimodal model that understands text and images and uses external tools like code execution and web search. The model is trained in two stages: first, to establish tool-use patterns, and second, to refine tool invocation. The researchers also introduce RealX-Bench, a benchmark to evaluate real-world multimodal reasoning, and demonstrate DeepEyesV2's effectiveness in various tasks, including perception, reasoning, and search.... |
Read More |
|
|
|
|
![]() |
Dense Motion Captioning |
Published at 2025-11-07 |
|
#ML
|
The researchers created a new large-scale dataset called CompMo for 3D human motion sequences, which has detailed temporal annotations and complex motions. They also developed a model called DEMO that generates detailed, timed captions for these motion sequences, which performs better than existing methods on this and other benchmarks.... |
Read More |
|
|
|
![]() |
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings |
Published at 2025-11-07 |
|
#ML
|
This study finds that large vision-language models have a bias towards language, which causes 'hallucinations' or false information. To fix this, they propose a method that combines visual and textual data more effectively, reducing hallucinations and improving accuracy on benchmark tests. (ELI5: Imagine a machine that misunderstands pictures because it focuses too much on words - this research shows a way to balance both, leading to fewer mistakes.)... |
Read More |
|
|
|
|
![]() |
Visual Spatial Tuning |
Published at 2025-11-07 |
|
#ML
|
The study presents a new method to improve spatial understanding in Vision-Language Models without affecting their general abilities. They create two large datasets, VST-P and VST-R, to enhance spatial perception and reasoning in models through a progressive training pipeline, resulting in state-of-the-art performance on various spatial benchmarks.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|