🤗 Daily Paper(2025-12-23)

1 view

Skip to first unread message

deep.di...@gmail.com

unread,

Dec 23, 2025, 3:06:59 PM12/23/25

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives

Published at 2025-12-14

#ML

The study examines syllogistic reasoning in large language models (LLMs) from both logical and natural language viewpoints, using 14 LLMs to explore their reasoning capabilities. The research finds that while this reasoning is not common in all LLMs, some models perform perfectly in symbolic reasoning, raising the question of whether LLMs are evolving into formal reasoning mechanisms....

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Published at 2025-12-18

#ML

The DataFlow framework is introduced to improve data preparation for Large Language Models (LLMs). It offers a unified, scalable, and reproducible way to handle data transformation and workflow automation, resulting in better LLM performance across various tasks and use cases....

Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation

Published at 2025-12-18

#ML

The study presents InfCam, a new method for generating high-quality, camera-controlled videos without relying on depth estimation. It uses infinite homography warping to encode camera movements directly into the video's latent space, ensuring accurate camera poses and high visual fidelity, and it improves upon existing methods by using a data augmentation pipeline to create diverse training datasets....

LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding

Published at 2025-12-18

#ML

This study presents LoPA, a new algorithm that improves the speed of inference in large language models by optimizing the order in which tokens are filled. The method significantly increases the number of tokens processed per forward pass, enhancing efficiency and enabling faster processing of language models....

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Published at 2025-12-18

#ML

The authors present a new method called Reasoning Palette that enhances the reasoning abilities of large language models by introducing a stochastic latent variable, which helps generate diverse and high-level reasoning paths. This approach allows for more efficient exploration and improved performance in both inference-time tasks and reinforcement learning, as demonstrated by experiments on various reasoning benchmarks....

Name That Part: 3D Part Segmentation and Naming

Published at 2025-12-19

#ML

The authors present ALIGN-Parts, a novel method for 3D part segmentation and naming that aligns partlets with part descriptions. This approach combines geometric, appearance, and semantic cues, and can match to arbitrary descriptions, creating a unified ontology for 3D parts and introducing a new dataset....

Region-Constraint In-Context Generation for Instructional Video Editing

Published at 2025-12-19

#ML

This study presents ReCo, a new approach for instructional video editing that improves the accuracy of editing regions and reduces interference between edited and non-edited areas. ReCo uses constraint modeling, latent and attention regularization, and a large-scale video editing dataset to enhance the quality and efficiency of video editing tasks....

UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

Published at 2025-12-19

#ML

The authors propose a new method called IPC that uses the internal knowledge of large language models to generate code without relying on external datasets. They demonstrate that this unsupervised approach can perform as well as supervised methods while using fewer resources....

MatSpray: Fusing 2D Material World Knowledge on 3D Geometry

Published at 2025-12-20

#ML

This study presents a method to combine 2D material data with 3D geometry, enhancing the accuracy and realism of 3D renderings. By using a combination of learning-based and projection-based approaches, the framework improves upon existing techniques, enabling more efficient and photorealistic asset creation workflows in content production pipelines....

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Published at 2025-12-21

#ML

This study examines if large language models can accurately estimate the difficulty of questions or tasks for human learners, finding that bigger models don't necessarily improve accuracy and that models struggle to simulate the limitations of students, suggesting that general problem-solving skills don't guarantee understanding of human cognitive struggles....

Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

Published at 2025-12-21

#ML

This study focuses on automating the 'capitalization tie-out' process, a crucial step in venture capital financing rounds, which involves verifying complex legal documents. The researchers compare existing systems and propose a new architecture to improve the reliability of multi-document reasoning and evidence traceability for this task....

Brain-Grounded Axes for Reading and Steering LLM States

Published at 2025-12-22

#ML

The authors propose using human brain activity as a coordinate system to read and steer the states of large language models (LLMs), as an alternative to textual supervision. They construct a word-level brain atlas and extract latent axes to map LLM hidden states, demonstrating that this approach yields interpretable and controllable handles for LLM behavior, with promising results in various models and sanity checks....

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Published at 2025-12-22

#ML

The paper presents CASA, a new method that enhances vision-language models by enabling local text-to-text interaction in cross-attention layers, improving performance on tasks with fine-grained visual details while maintaining efficiency for long-context multimodal tasks....

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Published at 2025-12-22

#ML

The study presents GenEnv, a framework that creates a dynamic and co-evolving relationship between Large Language Model agents and environment simulators. This approach, guided by a special reward system, generates tailored tasks for the agents, improving their performance by up to 40.3% compared to baseline models and using less data than traditional methods....

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Published at 2025-12-22

#ML

LoGoPlanner is a new navigation framework for mobile robots that improves upon traditional modular pipelines and previous end-to-end methods. It uses visual-geometry backbone to provide implicit state estimation for accurate localization, reconstructs scene geometry for reliable obstacle avoidance, and conditions the policy on implicit geometry to reduce error propagation. This results in a 27.3% improvement over oracle-localization baselines and strong generalization across embodiments and envi...

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

Published at 2025-12-22

#ML

MobileWorld is a new benchmark for testing mobile agents, which is more challenging and realistic than the current leading benchmark, AndroidWorld. It has longer tasks, more cross-application interactions, and new task categories like user interaction and MCP-augmented tasks, and the best agents only achieved a 51.7% success rate in it....

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Published at 2025-12-22

#ML

The authors present a new method called QuCo-RAG that reduces hallucinations in large language models by using objective statistics from pre-training data instead of relying on the model's internal signals. This approach identifies knowledge gaps and verifies entity co-occurrence, triggering retrieval when uncertainty is high, and has shown significant improvements in various benchmarks and models....

Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface

Published at 2025-12-22

#ML

The authors present a framework called Real2Edit2Real that generates new robotic demonstrations by combining 3D edits with 2D visual data through a 3D control interface. This method improves data efficiency for training robotic policies by up to 10-50 times, demonstrating its potential as a unified data generation framework for robot learning....

StoryMem: Multi-shot Long Video Storytelling with Memory

Published at 2025-12-22

#ML

The research presents a new method called StoryMem for creating long, multi-shot videos with cinematic quality and consistency, inspired by human memory. This is done by using a design that keeps track of important frames from previously generated shots and incorporating them into a pre-trained video model, resulting in better coherence and visual appeal in the final video story....

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Published at 2025-12-22

#ML

This study explores the connection between a neural network encoder's feature spectrum and its role, discovering that semantic encoders focus on low-frequency, abstract components, while pixel encoders also capture high-frequency, detailed information. They introduce a new model called Unified Autoencoding that combines these semantic and pixel representations, demonstrating improved performance on popular image datasets....

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Published at 2025-12-22

#ML

The authors present a method called WorldWarp that combines a 3D geometric cache and a 2D generative model to create long video clips with consistent and detailed visuals. By using a technique called Spatio-Temporal Diffusion, WorldWarp can fill in missing or occluded parts of the video while maintaining the overall 3D structure, resulting in high-quality video content....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages