🤗 Daily Paper(2025-09-16)

2 views

Skip to first unread message

deep.di...@gmail.com

unread,

Sep 16, 2025, 4:07:37 PMSep 16

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction

Published at 2025-09-09

#ML

The researchers created a new benchmark called LongEmotion to test emotional intelligence in large language models during lengthy, diverse, and noisy interactions. They improved performance by using Retrieval-Augmented Generation and Collaborative Emotional Modeling methods, which outperformed standard prompt-based methods in most long-context tasks, making language models more practical for real-world emotional intelligence applications....

Locality in Image Diffusion Models Emerges from Data Statistics

Published at 2025-09-11

#ML

This study shows that the locality in deep diffusion models comes from the statistical properties of image datasets, not from the inductive bias of convolutional neural networks. They prove this by demonstrating that a linear denoiser has similar locality properties to deep neural denoisers, which arise from pixel correlations in natural image datasets....

Measuring Epistemic Humility in Multimodal Large Language Models

Published at 2025-09-11

#ML

The study presents HumbleBench, a new benchmark to measure the ability of multimodal large language models to recognize incorrect answers, promoting epistemic humility. This benchmark addresses the limitation of existing evaluation methods by incorporating a 'None of the above' option, thus providing a more realistic assessment of model reliability in safety-critical applications....

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

Published at 2025-09-12

#ML

The authors present SearchInstruct, a novel method to build high-quality instruction datasets for fine-tuning large language models, particularly in specialized domains. This approach starts with a small set of domain-specific questions, expands them using a language model, and retrieves relevant resources to generate accurate answers, resulting in improved model performance and efficient updates....

GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings

Published at 2025-09-13

#ML

The study presents GAPrune, a pruning framework for domain-specific embedding models that balances domain importance and general linguistic foundation. By using Fisher Information and gradient alignment, GAPrune maintains performance close to dense models and outperforms baselines in experiments on FinMTEB and ChemTEB benchmarks....

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

Published at 2025-09-13

#ML

Researchers have created InternScenes, a large-scale, realistic indoor scene dataset with 40,000 diverse scenes, 1.96M objects, and 15 common scene types. The dataset includes small items, resolves object collisions, and enables more complex scene layout generation and point-goal navigation for Embodied AI research....

Nav-R1: Reasoning and Navigation in Embodied Scenes

Published at 2025-09-13

#ML

The authors present Nav-R1, a model that improves embodied navigation by integrating perception, reasoning, and action in complex 3D environments. It uses a large-scale dataset for structured reasoning, a reinforcement learning framework with multiple rewards, and a Fast-in-Slow reasoning paradigm to balance long-term planning and real-time control, resulting in better performance compared to existing methods....

CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media

Published at 2025-09-14

#ML

The researchers developed CognitiveSky, a free and scalable tool that uses advanced models to analyze user-generated content on decentralized social media platforms like Bluesky. This tool can track topics like mental health discourse, disinformation, and civic sentiment, and presents the data in an easy-to-understand dashboard....

Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting

Published at 2025-09-14

#ML

This study presents two new methods, hypervolume-guided weight adaptation and gradient-based weight optimization, to improve multi-objective reinforcement learning. These methods adaptively adjust reward weights during training, allowing for better exploration of Pareto fronts and more optimal solutions compared to traditional fixed-weight interpolation approaches....

PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits

Published at 2025-09-14

#ML

PersonaX is a new collection of multimodal datasets that includes information on public figures and athletes, their behavioral traits inferred by AI, facial imagery, and biographical details. The datasets are used to analyze the relationships between these traits and other data types, and a new method is introduced to learn causal relationships between them, which is shown to be effective through experiments....

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Published at 2025-09-14

#ML

The authors propose a new method called Semi-online Reinforcement Learning to improve GUI automation, combining the strengths of both offline and online RL to handle complex user interface interactions more effectively. This approach outperforms existing methods in various benchmarks, demonstrating better efficiency and multi-step task execution....

Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding

Published at 2025-09-15

#ML

The authors present Dr.V, a framework that helps detect and diagnose hallucinations in large video models by using a benchmark dataset and a satellite video agent. Dr.V-Bench, the dataset, contains 10,000 instances with detailed spatial-temporal annotations, while Dr.V-Agent identifies hallucinations through a step-by-step process that mimics human video comprehension, improving interpretability and reliability in real-world scenarios....

EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI

Published at 2025-09-15

#ML

This study presents EthicsMH, a new dataset of 125 scenarios designed to test how AI systems handle ethical issues in mental health care, focusing on confidentiality, autonomy, and bias. The dataset's structure allows for evaluation of both decision accuracy and explanation quality, aiming to foster the development of responsible AI systems in mental health....

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

Published at 2025-09-15

#ML

This study presents a new drag-based image editing method called LazyDrag, which enhances the stability and capabilities of Multi-Modal Diffusion Transformers by creating an explicit correspondence map from user inputs, eliminating the need for costly test-time optimization and enabling complex edits like inpainting and text-guided creation....

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models

Published at 2025-09-15

#ML

The study presents a new visual reasoning model, Reflection-V, which improves upon existing models by enhancing their ability to reflect on visual information. This is achieved by creating reasoning data that focuses on vision and using a reward system during training to encourage the model to use visual information, resulting in better performance on visual reasoning tasks....

Lost in Embeddings: Information Loss in Vision-Language Models

Published at 2025-09-15

#ML

The study examines how vision-language models process visual inputs and the potential information loss during this process. They use two methods to analyze and quantify this loss: evaluating semantic information preservation and directly measuring information loss through reconstruction, revealing substantial distortion in visual representations that impacts model performance....

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Published at 2025-09-15

#ML

The paper presents OmniWorld, a large-scale, multi-domain, and multi-modal dataset designed for 4D world modeling. This dataset, which includes a new OmniWorld-Game dataset and curated public datasets, offers richer modality coverage and more realistic dynamic interactions compared to existing synthetic datasets. It aims to advance the development of general-purpose 4D world models by providing a challenging benchmark and significantly improving performance in 4D reconstruction and video generat...

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages