🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
What does it mean to understand language? |
Published at 2025-11-24 |
|
#ML
|
Understanding language goes beyond simple interpretation and requires creating detailed mental images of the situation described. This process involves transferring information from the brain's language system to other areas responsible for perception, action, and memory, which allows for a deeper comprehension of the language.... |
Read More |
|
|
|
![]() |
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction |
Published at 2025-11-25 |
|
#ML
|
This study proposes a benchmark called ENACT to assess embodied cognition in modern vision-language models, which are typically trained without considering physical interaction. ENACT evaluates models' abilities in tasks like affordance recognition and action-effect reasoning through a visual question answering format, revealing a performance gap between models and humans, especially in long-horizon tasks.... |
Read More |
|
|
|
|
![]() |
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory |
Published at 2025-11-26 |
|
#ML
|
The authors present ViLoMem, a dual-stream memory framework that helps MLLMs learn from both visual and logical errors by storing multimodal semantic knowledge. This framework allows MLLMs to accumulate and update their knowledge without forgetting previous information, leading to improved performance and fewer repeated mistakes in multimodal tasks.... |
Read More |
|
|
|
![]() |
Canvas-to-Image: Compositional Image Generation with Multimodal Controls |
Published at 2025-11-26 |
|
#ML
|
The authors present a new framework called Canvas-to-Image that allows users to generate images with high-fidelity compositional and multimodal control through a single canvas interface. This framework can interpret various control signals, such as text prompts and spatial arrangements, and has been shown to outperform existing methods in preserving identity and adhering to user control.... |
Read More |
|
|
|
|
![]() |
MIRA: Multimodal Iterative Reasoning Agent for Image Editing |
Published at 2025-11-26 |
|
#ML
|
The paper presents MIRA, an agent that edits images using natural language instructions through an iterative process, making decisions based on visual feedback. MIRA is trained using a large dataset and a specific training pipeline, and when used with certain open-source image editing models, it significantly enhances the accuracy and quality of the edits.... |
Read More |
|
|
|
![]() |
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following |
Published at 2025-11-26 |
|
#ML
|
The study introduces Multi-Crit, a benchmark to evaluate large multimodal models' ability to follow diverse, detailed evaluation criteria, revealing that proprietary models struggle with consistent adherence, especially in open-ended tasks, while open-source models lag further behind. The benchmark also assesses models' flexibility in switching between criteria and recognizing preference conflicts, providing insights into the current limits of multimodal AI evaluation.... |
Read More |
|
|
|
|
![]() |
Video Generation Models Are Good Latent Reward Models |
Published at 2025-11-26 |
|
#ML
|
The study presents a new method called PRFL that uses pre-trained video generation models for reward modeling in noisy latent spaces, which is more efficient and effective than existing methods that operate on pixel-space inputs. PRFL enables optimization in latent space, reducing memory consumption and training time while improving alignment with human preferences for video generation.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|