🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
The Aloe Family Recipe for Open and Specialized Healthcare LLMs |
Published at 2025-05-07 |
|
#ML
|
The study presents Aloe Beta, a new open-source healthcare LLM that improves data preprocessing, training, and model safety and efficacy, setting a new standard for the field. Aloe Beta outperforms existing models in healthcare benchmarks and medical fields while maintaining high ethical standards and safety.... |
Read More |
|
|
|
![]() |
The Distracting Effect: Understanding Irrelevant Passages in RAG |
Published at 2025-05-11 |
|
#ML
|
This research investigates how irrelevant passages in Retrieval Augmented Generation (RAG) can mislead answer-generating LLMs, proposing a new measure for the distracting effect of these passages. The study also presents innovative methods for identifying challenging distracting passages, which, when used to fine-tune LLMs, improve answering accuracy by up to 7.5% compared to traditional RAG datasets.... |
Read More |
|
|
|
|
![]() |
MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8 |
Published at 2025-05-14 |
|
#ML
|
The study presents MIGRATION-BENCH, a new benchmark for code migration from Java 8 to the latest LTS versions, focusing on a comprehensive dataset and a curated subset for research. The paper also introduces an evaluation framework and demonstrates the effectiveness of LLMs in handling repository-level code migration tasks.... |
Read More |
|
|
|
![]() |
Understanding Gen Alpha Digital Language: Evaluation of LLM Safety Systems for Content Moderation |
Published at 2025-05-14 |
|
#ML
|
This study evaluates four AI models in detecting hidden harassment and manipulation in Gen Alpha's unique digital language, revealing gaps in current safety tools and offering a new dataset and framework for improving youth online protection.... |
Read More |
|
|
|
|
![]() |
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence |
Published at 2025-05-15 |
|
#ML
|
This study proposes a new method called IEMF that improves multimodal learning in AI by mimicking the brain's dynamic mechanism of integrating information from different senses. The proposed method leads to better performance and lower computational costs, and it works well with various types of neural networks.... |
Read More |
|
|
|
![]() |
Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation |
Published at 2025-05-16 |
|
#ML
|
This study examines how language models handle complex multi-hop question answering tasks by rearranging search results. The findings suggest that encoder-decoder models generally perform better than causal decoder-only models, and that optimizing the order of search results and enhancing causal decoder-only models with bi-directional attention can improve performance. Additionally, the research reveals a correlation between attention weights and correct answers, which can be used to improve lan... |
Read More |
|
|
|
|
![]() |
Object-Centric Representations Improve Policy Generalization in Robot Manipulation |
Published at 2025-05-16 |
|
#ML
|
The study explores using object-centric representations (OCR) in robotic manipulation, which breaks down visual input into distinct entities. The results show that OCR-based policies generalize better than dense and global representations in various manipulation tasks, even without specific training, highlighting OCR's potential for improving robotic vision in real-world, changing environments.... |
Read More |
|
|
|
![]() |
Phare: A Safety Probe for Large Language Models |
Published at 2025-05-16 |
|
#ML
|
Phare is a new tool that tests large language models for safety issues like spreading biases or creating harmful content, helping to make these models more reliable and trustworthy by revealing their specific weaknesses.... |
Read More |
|
|
|
|
![]() |
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling |
Published at 2025-05-16 |
|
#ML
|
This study explores how often a verifier should be used during language model generation to balance performance and computational cost, introducing a new algorithm called Variable Granularity Search. Experiments show that adjusting the verifier's frequency can improve efficiency and accuracy, outperforming existing methods while reducing computational resources.... |
Read More |
|
|
|
![]() |
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training |
Published at 2025-05-16 |
|
#ML
|
The study presents two advancements in attention mechanism efficiency: utilizing new FP4 Tensor Cores in Blackwell GPUs for faster attention computation, resulting in a 5x speedup over the fastest FlashAttention on RTX5090, and introducing an accurate and efficient 8-bit attention for both training and inference tasks, which achieves lossless performance in fine-tuning tasks but shows slower convergence in pretraining tasks.... |
Read More |
|
|
|
|
![]() |
Learning to Highlight Audio by Watching Movies |
Published at 2025-05-17 |
|
#ML
|
The authors propose a new task called visually-guided acoustic highlighting to improve the synchronization between visual and audio elements in videos. They introduce a transformer-based framework and a new dataset, the muddy mix dataset, to achieve this, and their approach outperforms several baselines in both quantitative and subjective evaluations.... |
Read More |
|
|
|
![]() |
Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier |
Published at 2025-05-17 |
|
#ML
|
The authors present FlexiVe, a new generative verifier that efficiently balances computational resources for both speed and reliability in large language models, and the Solve-Detect-Verify pipeline, an efficient inference framework that integrates FlexiVe to improve reasoning accuracy and efficiency. The system outperforms baselines on various reasoning benchmarks, offering a scalable solution for enhancing language model performance during testing.... |
Read More |
|
|
|
|
![]() |
Truth Neurons |
Published at 2025-05-17 |
|
#ML
|
This study discovers specific neurons in language models, called 'truth neurons,' that consistently encode truthfulness, making the models more reliable and safer to use. The existence of these neurons is confirmed across different models, and suppressing them harms the model's performance on various tasks, indicating the importance of truthfulness mechanisms in language models.... |
Read More |
|
|
|
![]() |
Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection |
Published at 2025-05-18 |
|
#ML
|
The study presents WikiDYK, a new benchmark for evaluating knowledge memorization in language models, using Wikipedia's 'Did You Know' entries. Experiments show that Bidirectional Language Models outperform Causal Language Models in knowledge memorization, and a modular framework is introduced to improve the performance of smaller BiLMs in language models.... |
Read More |
|
|
|
|
![]() |
Exploring Federated Pruning for Large Language Models |
Published at 2025-05-18 |
|
#ML
|
The authors present FedPrLLM, a privacy-preserving framework for compressing large language models without needing public calibration samples, allowing for the collaborative pruning of a global model while protecting local data privacy.... |
Read More |
|
|
|
![]() |
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning |
Published at 2025-05-18 |
|
#ML
|
The study presents a new method called SSR that improves depth perception in Vision-Language Models by converting depth data into textual rationales, which enhances spatial reasoning. They also create a new dataset, SSR-CoT, for evaluating this method and demonstrate its effectiveness in multiple benchmarks, making VLMs better at understanding multi-modal information like humans.... |
Read More |
|
|
|
|
![]() |
CS-Sum: A Benchmark for Code-Switching Dialogue Summarization and the Limits of Large Language Models |
Published at 2025-05-19 |
|
#ML
|
The study presents CS-Sum, a benchmark for evaluating the comprehension of code-switching in large language models, focusing on summarizing code-switched dialogues in Mandarin-English, Tamil-English, and Malay-English. Results indicate that while automated metrics show high scores, language models often make subtle errors that change the dialogue's meaning, highlighting the need for specialized training on code-switched data.... |
Read More |
|
|
|
![]() |
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs |
Published at 2025-05-19 |
|
#ML
|
The authors present a solution called CoIn to verify the invisible reasoning tokens in commercial LLM APIs, which are often hidden to protect proprietary information and reduce verbosity. CoIn checks the number and meaning of these hidden tokens to prevent overcharging, and experiments show it can detect token count inflation with a success rate of up to 94.7%.... |
Read More |
|
|
|
|
![]() |
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition |
Published at 2025-05-19 |
|
#ML
|
The study presents a new method called 'competition' to improve the training of complex models by efficiently assigning tasks to the most capable experts. The proposed algorithm, CompeteSMoE, demonstrates superior performance and efficiency in training large language models compared to existing methods, as shown in extensive experiments.... |
Read More |
|
|
|
![]() |
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization |
Published at 2025-05-19 |
|
#ML
|
This research presents a new method, Quantized Zeroth-order Optimization (QZO), which significantly reduces memory usage for training large language models by approximating gradients and using model quantization, making it possible to train large models like Llama-2-13B and Stable Diffusion 3.5 Large within a single 24GB GPU.... |
Read More |
|
|
|
|
![]() |
Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair |
Published at 2025-05-19 |
|
#ML
|
This study presents WILLIAMT, a new system for fixing bugs in software that significantly lowers the cost and improves the success rate compared to existing methods, making it a promising solution for addressing the growing number of vulnerabilities in modern software.... |
Read More |
|
|
|
![]() |
GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization |
Published at 2025-05-19 |
|
#ML
|
The authors present a new framework called GeoRanker that uses large vision-language models to improve the accuracy of predicting GPS coordinates from images by considering spatial relationships among candidates. They also create a new dataset for this task and demonstrate that GeoRanker outperforms existing methods in benchmark tests.... |
Read More |
|
|
|
|
![]() |
Neurosymbolic Diffusion Models |
Published at 2025-05-19 |
|
#ML
|
The study presents a new type of Neurosymbolic predictor, NeSyDMs, which improves upon existing models by using discrete diffusion to consider the relationships between symbols, rather than assuming they are independent. This method allows for better uncertainty quantification and generalization, outperforming other Neurosymbolic predictors in various tasks, such as visual path planning and autonomous driving.... |
Read More |
|
|
|
![]() |
Optimizing Anytime Reasoning via Budget Relative Policy Optimization |
Published at 2025-05-19 |
|
#ML
|
The researchers propose a new framework called AnytimeReasoner to improve the token efficiency and flexibility of large language models in reasoning tasks under varying token budget constraints. They introduce a novel variance reduction technique, Budget Relative Policy Optimization, to enhance the robustness and efficiency of the learning process, and empirical results show that their method outperforms the existing GRPO method in mathematical reasoning tasks.... |
Read More |
|
|
|
|
![]() |
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning |
Published at 2025-05-19 |
|
#ML
|
The authors present a method called Reasoning Path Compression (RPC) that enhances the efficiency of language models in reasoning tasks. RPC reduces memory usage and improves generation speed by compressing the semantic sparsity of reasoning paths, allowing for practical deployment of reasoning LLMs with minimal accuracy loss.... |
Read More |
|
|
|
![]() |
To Bias or Not to Bias: Detecting bias in News with bias-detector |
Published at 2025-05-19 |
|
#ML
|
Researchers improved media bias detection by fine-tuning a model on a high-quality dataset, showing better performance than a baseline model and avoiding common biases. They propose a pipeline combining their model with an existing bias-type classifier, contributing to more robust and explainable NLP systems for media bias detection.... |
Read More |
|
|
|
|
![]() |
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings |
Published at 2025-05-19 |
|
#ML
|
The study presents a two-stage training strategy for creating reasoning-capable language models in situations with limited data. By first 'warming up' the model using logic puzzles and then applying reinforcement learning, the approach improves performance, generalization, and sample efficiency across various tasks.... |
Read More |
|
|
|
![]() |
Dynadiff: Single-stage Decoding of Images from Continuously Evolving fMRI |
Published at 2025-05-20 |
|
#ML
|
The authors present Dynadiff, a simplified single-stage model for reconstructing images from dynamic fMRI recordings, outperforming previous models in time-resolved brain decoding and offering a precise understanding of image representation evolution in brain activity.... |
Read More |
|
|
|
|
![]() |
Emerging Properties in Unified Multimodal Pretraining |
Published at 2025-05-20 |
|
#ML
|
The authors present BAGEL, a free and open-source model that can understand and generate multiple types of data, such as text, images, and videos. By training BAGEL on a large amount of diverse data, it can perform complex tasks like manipulating images, predicting future video frames, and navigating in 3D environments, outperforming other open-source models in multimodal research.... |
Read More |
|
|
|
![]() |
General-Reasoner: Advancing LLM Reasoning Across All Domains |
Published at 2025-05-20 |
|
#ML
|
This research presents General-Reasoner, a new method to improve the reasoning skills of large language models in various fields. They created a large, high-quality question dataset and a new answer verification system, which, when used to train models, outperform existing methods on 12 different benchmarks, including physics, chemistry, finance, and more, while being particularly effective in mathematical reasoning tasks.... |
Read More |
|
|
|
|
![]() |
Hunyuan-Game: Industrial-grade Intelligent Game Creation Model |
Published at 2025-05-20 |
|
#ML
|
Hunyuan-Game is a new system that uses AI to create high-quality game images and videos. It has models for generating various types of game images like general text-to-image, visual effects, transparent images, and game characters. For videos, it has models for image-to-video generation, pose avatar synthesis, dynamic illustration generation, video super-resolution, and interactive game video generation. These models can understand and replicate different game and anime art styles.... |
Read More |
|
|
|
![]() |
KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models |
Published at 2025-05-20 |
|
#ML
|
The study presents KERL, a system that combines food Knowledge Graphs and Large Language Models to provide personalized food recommendations, recipe generation, and nutritional analysis. KERL outperforms existing methods in extensive experiments, and its code and benchmark datasets are publicly available for use and further research.... |
Read More |
|
|
|
|
![]() |
Latent Flow Transformer |
Published at 2025-05-20 |
|
#ML
|
The authors propose a new method, Latent Flow Transformer, which replaces multiple layers in transformers with a single learned operator, making the model more efficient. They also introduce an algorithm to improve the preservation of connections in flow-based methods, and their method outperforms skipping layers in experiments, demonstrating its feasibility and potential.... |
Read More |
|
|
|
![]() |
Lessons from Defending Gemini Against Indirect Prompt Injections |
Published at 2025-05-20 |
|
#ML
|
This report explains how Google DeepMind tests the security of their Gemini models against potential threats. They use a continuous evaluation framework with various attack techniques to make Gemini more resistant to manipulation and better protect user data and permissions.... |
Read More |
|
|
|
|
![]() |
NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search |
Published at 2025-05-20 |
|
#ML
|
The paper proposes NExT-Search, a new paradigm for generative AI search that incorporates fine-grained, process-level feedback. It introduces two modes - User Debug Mode and Shadow User Mode - to collect feedback and suggests methods for online adaptation and offline update to improve search models continuously.... |
Read More |
|
|
|
![]() |
Not All Correct Answers Are Equal: Why Your Distillation Source Matters |
Published at 2025-05-20 |
|
#ML
|
The study compares three large datasets of reasoning task outputs from three different expert models, finding that one model's dataset, AM-Thinking-v1, helps student models perform best on various reasoning tasks, producing longer answers for harder tasks and shorter ones for simpler tasks. The researchers release two of these datasets for public use to aid future research on reasoning-focused language models.... |
Read More |
|
|
|
|
![]() |
Reasoning Models Better Express Their Confidence |
Published at 2025-05-20 |
|
#ML
|
The study finds that reasoning models, which use extended chain-of-thought reasoning, are better at solving problems and expressing their confidence accurately compared to non-reasoning models. This is due to their ability to adjust confidence dynamically through slow thinking behaviors, and even non-reasoning models can improve calibration by mimicking these behaviors.... |
Read More |
|
|
|
![]() |
Reward Reasoning Model |
Published at 2025-05-20 |
|
#ML
|
This study presents Reward Reasoning Models (RRMs) that use extra computing power at the time of use to make a thoughtful decision before giving a final reward, especially for difficult questions. These models are trained using a method called reinforcement learning, which allows them to improve their reward reasoning abilities on their own without needing specific examples for training. The results show that RRMs perform better than other reward models in various areas and can adjust the comput... |
Read More |
|
|
|
|
![]() |
Think Only When You Need with Large Hybrid-Reasoning Models |
Published at 2025-05-20 |
|
#ML
|
The study presents Large Hybrid-Reasoning Models (LHRMs) that can adaptively decide when to use extended thinking processes based on user query context, unlike traditional Large Language Models (LLMs) and Large Reasoning Models (LRMs) that always use thinking. LHRMs are trained using a two-stage pipeline and a new metric called Hybrid Accuracy, resulting in improved reasoning and general capabilities while reducing inefficiency due to unnecessary thinking.... |
Read More |
|
|
|
![]() |
Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits |
Published at 2025-05-20 |
|
#ML
|
This study explores how tokenization methods in language models, specifically subword-based techniques like BPE, hinder symbolic reasoning by disrupting logical alignment and atomic reasoning units. The researchers introduce the concept of Token Awareness and demonstrate that atomically-aligned formats significantly improve reasoning performance, enabling smaller models to outperform larger ones in structured tasks.... |
Read More |
|
|
|
|
![]() |
Towards Embodied Cognition in Robots via Spatially Grounded Synthetic Worlds |
Published at 2025-05-20 |
|
#ML
|
The authors propose a method for teaching robots to understand spatial relationships in their environment through a synthetic dataset created in NVIDIA Omniverse. This dataset, which includes images, descriptions, and object pose information, is designed to train Vision-Language Models for tasks like Visual Perspective Taking, with the ultimate goal of improving Human-Robot Interaction.... |
Read More |
|
|
|
![]() |
Towards eliciting latent knowledge from LLMs with mechanistic interpretability |
Published at 2025-05-20 |
|
#ML
|
The study trains a Taboo model, a language model that describes a secret word without explicitly stating it, to explore techniques for revealing hidden knowledge. The researchers evaluate both black-box and mechanistic interpretability approaches, finding both effective in uncovering the secret word. This work contributes to ensuring the safe and reliable deployment of language models by addressing the issue of secret knowledge.... |
Read More |
|
|
|
|
![]() |
Training-Free Watermarking for Autoregressive Image Generation |
Published at 2025-05-20 |
|
#ML
|
The researchers developed a method called IndexMark, which is a way to protect ownership of images created by a specific type of model without affecting the image quality. This method uses a strategy called match-then-replace to embed a watermark, and it also includes a way to improve the accuracy of checking the watermark and make it more resistant to attacks like cropping.... |
Read More |
|
|
|
![]() |
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training |
Published at 2025-05-20 |
|
#ML
|
The authors present a new method called RICE to enhance reasoning in large models without extra training. By using a specific measure, they identify and utilize experts that improve reasoning accuracy, efficiency, and adaptability across various tasks, outperforming other techniques while maintaining the model's general abilities.... |
Read More |
|
|
|
|
![]() |
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation |
Published at 2025-05-20 |
|
#ML
|
The study finds that existing long video understanding benchmarks are flawed due to the possibility of guessing answers and strong priors that allow models to answer without watching the video. The researchers introduce VideoEval-Pro, a more realistic benchmark with open-ended questions, which better evaluates video understanding models and shows that increasing the number of frames improves performance more than other benchmarks.... |
Read More |
|
|
|
![]() |
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning |
Published at 2025-05-20 |
|
#ML
|
This paper presents Visionary-R1, a model that uses reinforcement learning and visual question-answer pairs to develop reasoning capabilities in visual language models, without explicit guidance. The model's success lies in its ability to interpret images before reasoning, which helps it avoid shortcuts and generalize better, outperforming other multimodal models on various benchmarks.... |
Read More |
|
|
|
|
![]() |
Visual Agentic Reinforcement Fine-Tuning |
Published at 2025-05-20 |
|
#ML
|
This study presents a method called Visual-ARFT that enhances the reasoning abilities of Large Vision-Language Models, enabling them to browse the web and manipulate images. The researchers also introduced a new benchmark to test these abilities, and their results show that Visual-ARFT significantly outperforms existing models, including GPT-4o.... |
Read More |
|
|
|
![]() |
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank |
Published at 2025-05-20 |
|
#ML
|
The study presents VisualQuality-R1, a new model for assessing image quality without reference images, which uses reinforcement learning to rank images based on their quality. The model outperforms other deep learning models and can generate detailed quality descriptions, making it ideal for evaluating image processing tasks like super-resolution and image generation.... |
Read More |
|
|
|
|
![]() |
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits |
Published at 2025-05-20 |
|
#ML
|
The researchers present a new benchmark called Vox-Profile, which provides detailed profiles of speakers and their speech, taking into account both fixed traits like age and sex, and changing factors like emotion and speech rhythm. They tested this benchmark using various speech datasets and models, and demonstrated its use in improving speech recognition, evaluating speech generation systems, and comparing automated profiles to human evaluations.... |
Read More |
|
|
|
![]() |
Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas |
Published at 2025-05-20 |
|
#ML
|
The authors propose LitmusValues, a method to uncover AI model priorities, and AIRiskDilemmas, a collection of scenarios to test these priorities. They demonstrate that seemingly harmless values in AI models can predict both known and unknown risky behaviors.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|