🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction |
Published at 2025-09-09 |
#ML
|
The researchers created a new benchmark called LongEmotion to test emotional intelligence in large language models during lengthy, diverse, and noisy interactions. They improved performance by using Retrieval-Augmented Generation and Collaborative Emotional Modeling methods, which outperformed standard prompt-based methods in most long-context tasks, making language models more practical for real-world emotional intelligence applications.... |
Read More |
|
|
![]() |
Locality in Image Diffusion Models Emerges from Data Statistics |
Published at 2025-09-11 |
#ML
|
This study shows that the locality in deep diffusion models comes from the statistical properties of image datasets, not from the inductive bias of convolutional neural networks. They prove this by demonstrating that a linear denoiser has similar locality properties to deep neural denoisers, which arise from pixel correlations in natural image datasets.... |
Read More |
|
|
|
![]() |
Measuring Epistemic Humility in Multimodal Large Language Models |
Published at 2025-09-11 |
#ML
|
The study presents HumbleBench, a new benchmark to measure the ability of multimodal large language models to recognize incorrect answers, promoting epistemic humility. This benchmark addresses the limitation of existing evaluation methods by incorporating a 'None of the above' option, thus providing a more realistic assessment of model reliability in safety-critical applications.... |
Read More |
|
|
![]() |
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation |
Published at 2025-09-12 |
#ML
|
The authors present SearchInstruct, a novel method to build high-quality instruction datasets for fine-tuning large language models, particularly in specialized domains. This approach starts with a small set of domain-specific questions, expands them using a language model, and retrieves relevant resources to generate accurate answers, resulting in improved model performance and efficient updates.... |
Read More |
|
|
|
![]() |
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings |
Published at 2025-09-13 |
#ML
|
The study presents GAPrune, a pruning framework for domain-specific embedding models that balances domain importance and general linguistic foundation. By using Fisher Information and gradient alignment, GAPrune maintains performance close to dense models and outperforms baselines in experiments on FinMTEB and ChemTEB benchmarks.... |
Read More |
|
|
![]() |
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts |
Published at 2025-09-13 |
#ML
|
Researchers have created InternScenes, a large-scale, realistic indoor scene dataset with 40,000 diverse scenes, 1.96M objects, and 15 common scene types. The dataset includes small items, resolves object collisions, and enables more complex scene layout generation and point-goal navigation for Embodied AI research.... |
Read More |
|
|
|
![]() |
Nav-R1: Reasoning and Navigation in Embodied Scenes |
Published at 2025-09-13 |
#ML
|
The authors present Nav-R1, a model that improves embodied navigation by integrating perception, reasoning, and action in complex 3D environments. It uses a large-scale dataset for structured reasoning, a reinforcement learning framework with multiple rewards, and a Fast-in-Slow reasoning paradigm to balance long-term planning and real-time control, resulting in better performance compared to existing methods.... |
Read More |
|
|
![]() |
CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media |
Published at 2025-09-14 |
#ML
|
The researchers developed CognitiveSky, a free and scalable tool that uses advanced models to analyze user-generated content on decentralized social media platforms like Bluesky. This tool can track topics like mental health discourse, disinformation, and civic sentiment, and presents the data in an easy-to-understand dashboard.... |
Read More |
|
|
|
![]() |
Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting |
Published at 2025-09-14 |
#ML
|
This study presents two new methods, hypervolume-guided weight adaptation and gradient-based weight optimization, to improve multi-objective reinforcement learning. These methods adaptively adjust reward weights during training, allowing for better exploration of Pareto fronts and more optimal solutions compared to traditional fixed-weight interpolation approaches.... |
Read More |
|
|
![]() |
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits |
Published at 2025-09-14 |
#ML
|
PersonaX is a new collection of multimodal datasets that includes information on public figures and athletes, their behavioral traits inferred by AI, facial imagery, and biographical details. The datasets are used to analyze the relationships between these traits and other data types, and a new method is introduced to learn causal relationships between them, which is shown to be effective through experiments.... |
Read More |
|
|
|
![]() |
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning |
Published at 2025-09-14 |
#ML
|
The authors propose a new method called Semi-online Reinforcement Learning to improve GUI automation, combining the strengths of both offline and online RL to handle complex user interface interactions more effectively. This approach outperforms existing methods in various benchmarks, demonstrating better efficiency and multi-step task execution.... |
Read More |
|
|
![]() |
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding |
Published at 2025-09-15 |
#ML
|
The authors present Dr.V, a framework that helps detect and diagnose hallucinations in large video models by using a benchmark dataset and a satellite video agent. Dr.V-Bench, the dataset, contains 10,000 instances with detailed spatial-temporal annotations, while Dr.V-Agent identifies hallucinations through a step-by-step process that mimics human video comprehension, improving interpretability and reliability in real-world scenarios.... |
Read More |
|
|
|
![]() |
EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI |
Published at 2025-09-15 |
#ML
|
This study presents EthicsMH, a new dataset of 125 scenarios designed to test how AI systems handle ethical issues in mental health care, focusing on confidentiality, autonomy, and bias. The dataset's structure allows for evaluation of both decision accuracy and explanation quality, aiming to foster the development of responsible AI systems in mental health.... |
Read More |
|
|
![]() |
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence |
Published at 2025-09-15 |
#ML
|
This study presents a new drag-based image editing method called LazyDrag, which enhances the stability and capabilities of Multi-Modal Diffusion Transformers by creating an explicit correspondence map from user inputs, eliminating the need for costly test-time optimization and enabling complex edits like inpainting and text-guided creation.... |
Read More |
|
|
|
![]() |
Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models |
Published at 2025-09-15 |
#ML
|
The study presents a new visual reasoning model, Reflection-V, which improves upon existing models by enhancing their ability to reflect on visual information. This is achieved by creating reasoning data that focuses on vision and using a reward system during training to encourage the model to use visual information, resulting in better performance on visual reasoning tasks.... |
Read More |
|
|
![]() |
Lost in Embeddings: Information Loss in Vision-Language Models |
Published at 2025-09-15 |
#ML
|
The study examines how vision-language models process visual inputs and the potential information loss during this process. They use two methods to analyze and quantify this loss: evaluating semantic information preservation and directly measuring information loss through reconstruction, revealing substantial distortion in visual representations that impacts model performance.... |
Read More |
|
|
|
![]() |
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling |
Published at 2025-09-15 |
#ML
|
The paper presents OmniWorld, a large-scale, multi-domain, and multi-modal dataset designed for 4D world modeling. This dataset, which includes a new OmniWorld-Game dataset and curated public datasets, offers richer modality coverage and more realistic dynamic interactions compared to existing synthetic datasets. It aims to advance the development of general-purpose 4D world models by providing a challenging benchmark and significantly improving performance in 4D reconstruction and video generat... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|