🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
![]() |
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning |
Published at 2025-07-18 |
#ML
|
The paper presents CUDA-L1, a reinforcement learning framework that significantly improves CUDA optimization, achieving high speedups across various GPU architectures without human expertise. This automated approach discovers optimization techniques, uncovers fundamental principles, and identifies performance bottlenecks, demonstrating the potential for increased GPU efficiency.... |
Read More |
|
|
![]() |
AnimalClue: Recognizing Animals by their Traces |
Published at 2025-07-27 |
#ML
|
The researchers have created a new, large-scale dataset called AnimalClue, which is specifically designed to help identify animal species from their indirect evidence like footprints and feces. This dataset includes over 150,000 annotated images of various types of animal traces and aims to advance the field of wildlife monitoring by addressing the challenge of recognizing subtle visual features in these images.... |
Read More |
|
|
|
![]() |
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge |
Published at 2025-07-27 |
#ML
|
The authors present a new framework called MaPPO that incorporates prior reward knowledge into preference optimization for large language models, improving alignment with human preferences. This method extends existing techniques by integrating prior reward estimates into a Maximum a Posteriori objective, enhancing performance without adding extra hyperparameters and supporting both offline and online settings.... |
Read More |
|
|
![]() |
Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers |
Published at 2025-07-28 |
#ML
|
This study compares different deep learning models for classifying African wildlife images, focusing on transfer learning with frozen feature extractors. The results show that while a Vision Transformer model has the highest accuracy, a DenseNet model offers a better balance of accuracy and resource requirements for real-world conservation work.... |
Read More |
|
|
|
![]() |
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge |
Published at 2025-07-29 |
#ML
|
The study presents a new language model, ChemDFM-R, designed specifically for chemistry. It is trained using a vast amount of detailed chemical information and a unique learning method that combines general knowledge with chemical reasoning, resulting in top performance on various chemical tests and providing clear, understandable answers.... |
Read More |
|
|
![]() |
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels |
Published at 2025-07-29 |
#ML
|
The authors present HunyuanWorld 1.0, a new framework that combines video-based and 3D-based methods to create immersive, explorable, and interactive 3D scenes from texts and images. This framework offers 360-degree experiences, mesh export, and disentangled object representations, and it outperforms existing methods in generating coherent 3D worlds for various applications.... |
Read More |
|
|
|
![]() |
MOVE: Motion-Guided Few-Shot Video Object Segmentation |
Published at 2025-07-29 |
#ML
|
The authors present a new large-scale dataset called MOVE, which focuses on motion-guided few-shot video object segmentation. They evaluate various methods on this dataset and introduce a baseline approach called Decoupled Motion Appearance Network, which outperforms existing techniques in few-shot motion understanding.... |
Read More |
|
|
![]() |
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions |
Published at 2025-07-29 |
#ML
|
This study creates a new benchmark called MoHoBench to evaluate the honesty of multimodal large language models when answering unanswerable visual questions. The researchers find that most models struggle to recognize when they shouldn't answer and that visual information significantly impacts their honesty, leading them to develop initial methods to improve multimodal honesty alignment.... |
Read More |
|
|
|
![]() |
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again |
Published at 2025-07-29 |
#ML
|
The study presents X-Omni, a framework that uses reinforcement learning to improve the quality of discrete autoregressive image generation models, enabling better integration of image and language generation. X-Omni, which includes a semantic image tokenizer, a unified autoregressive model, and an offline diffusion decoder, outperforms existing methods in image generation tasks using a 7B language model, producing high-quality images that follow instructions and render long texts effectively.... |
Read More |
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|