🤗 Daily Paper(2025-08-18)

5 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 18, 2025, 4:07:14 PMAug 18

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

SPARSE Data, Rich Results: Few-Shot Semi-Supervised Learning via Class-Conditioned Image Translation

Published at 2025-08-08

#ML

The researchers developed a new method that uses three neural networks to improve medical image classification with limited labeled data. This method outperforms existing techniques, especially when there are very few labeled images, and can be useful in medical imaging where annotating data is expensive....

DINOv3

Published at 2025-08-13

#ML

The DINOv3 model is introduced, which is a self-supervised learning model that can learn visual representations from diverse sources using a single algorithm. It is a versatile vision foundation model that outperforms specialized state-of-the-art models across various settings and tasks without fine-tuning, thanks to its new method called Gram anchoring and post-hoc strategies for flexibility....

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Published at 2025-08-14

#ML

The authors present BeyondWeb, a framework for generating high-quality synthetic data for pretraining large language models, which outperforms existing synthetic datasets and traditional web-scale data. They provide insights into factors affecting synthetic data quality, showing that optimizing many factors is necessary for the best outcomes....

MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

Published at 2025-08-14

#ML

The study evaluates and compares various fusion strategies and normalization schemes for multimodal, multitemporal, and multispectral Earth observation data. The researchers then propose MAESTRO, a modified version of the Masked Autoencoder, which incorporates optimized fusion strategies and a new normalization scheme that uses a spectral prior as a self-supervisory signal, achieving state-of-the-art results on several Earth observation datasets....

PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing

Published at 2025-08-14

#ML

The authors present a new method called PaperRegister that improves paper search by using a hierarchical indexing system, allowing for more detailed and flexible queries compared to traditional abstract-based indexing. Experiments show that PaperRegister outperforms existing systems, especially in fine-grained scenarios, making it a promising solution for real-world paper search applications....

SSRL: Self-Search Reinforcement Learning

Published at 2025-08-14

#ML

This study explores using large language models (LLMs) as efficient simulators for agentic search tasks in reinforcement learning, reducing the need for costly external search engine interactions. The researchers developed Self-Search RL (SSRL), which enhances LLMs' search capability through rewards, enabling models to refine their knowledge internally and providing a cost-effective environment for search-driven RL training, with findings showing reduced hallucination and seamless integration wi...

TexVerse: A Universe of 3D Objects with High-Resolution Textures

Published at 2025-08-14

#ML

Researchers have created a large collection of over 858K high-resolution 3D models called TexVerse, which includes various subsets for skeletons, animations, and detailed annotations. This dataset can be used for developing high-quality textures, animations, and other 3D graphics tasks....

X-Node: Self-Explanation is All We Need

Published at 2025-08-14

#ML

The authors present a new framework called X-Node that enables each node in a graph neural network to generate its own explanation during prediction. This approach offers detailed, local reasoning for individual nodes, improving interpretability in clinical applications, and maintains competitive accuracy on various graph datasets....

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Published at 2025-08-14

#ML

The study presents XQuant, a method that reduces memory consumption for language model inference by quantizing and caching input activations instead of using standard KV caching. This results in significant memory savings and improved accuracy compared to existing techniques, with up to 7.7 times memory savings and less than 0.1 perplexity degradation. Additionally, XQuant-CL is introduced for extreme compression, achieving up to 12.5 times memory savings with only 0.1 perplexity degradation....

Controlling Multimodal LLMs via Reward-guided Decoding

Published at 2025-08-15

#ML

This study presents a new method for controlling Multimodal Large Language Models (MLLMs) by guiding their decoding process with rewards, specifically improving their visual grounding. The method involves creating reward models to control the accuracy and completeness of object descriptions in the model's output, providing users with on-the-fly control over the balance between object precision and recall during image captioning tasks....

FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation

Published at 2025-08-15

#ML

This study presents Talking-Critic, a model for measuring how well generated videos meet human expectations across multiple dimensions, and Talking-NSQ, a large dataset of human preferences. They also introduce TLPO, a new framework for aligning audio-driven portrait animation models with fine-grained human preferences, which significantly improves lip-sync accuracy, motion naturalness, and visual quality compared to existing methods....

StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation

Published at 2025-08-15

#ML

The researchers developed a framework called StyleMM that creates stylized 3D face models based on user-provided text descriptions. It uses a pre-trained network and a texture generator, and fine-tunes them with stylized facial images to preserve facial attributes while changing the style. The result is a model that can generate stylized face meshes with control over shape, expression, and texture, outperforming existing methods....

Thyme: Think Beyond Images

Published at 2025-08-15

#ML

The authors present Thyme, a new approach for MLLMs to go beyond existing image-based reasoning by autonomously generating and executing image processing and computational operations via code. Thyme involves a two-stage training process and an algorithm that balances reasoning exploration with code execution precision, resulting in significant performance gains in perception and reasoning tasks....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages