🤗 Daily Paper(2025-08-01)

7 views

Skip to first unread message

deep.di...@gmail.com

unread,

Aug 1, 2025, 4:07:08 PMAug 1

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Flow Equivariant Recurrent Neural Networks

Published at 2025-07-19

#ML

This study extends the concept of equivariant networks to account for time-based transformations, like moving images, in Recurrent Neural Networks (RNNs). The new flow equivariant RNNs outperform traditional RNNs in training speed, predicting future steps, and adapting to different speeds, marking a step towards more realistic sequence modeling....

AgroBench: Vision-Language Model Benchmark in Agriculture

Published at 2025-07-28

#ML

The AgroBench benchmark evaluates vision-language models for agriculture, covering 203 crop and 682 disease categories annotated by expert agronomists. The study finds that these models need improvement, especially in fine-grained tasks like weed identification, and provides suggestions for future development....

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Published at 2025-07-29

#ML

This study discovers specific directions in a language model's activation space, called persona vectors, which represent different traits like evilness, flattery, and tendency to hallucinate. These vectors are used to monitor and control changes in the model's personality during training and deployment, helping to maintain the desired helpful, harmless, and honest Assistant persona....

TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

Published at 2025-07-29

#ML

The authors present TARS, a new method that reduces inaccuracies in multimodal large language models by treating hallucination-related preferences as flexible targets, rather than fixed ones. TARS improves the models' reliability by minimizing overfitting to superficial cues and preserving the connection to relevant visual information, demonstrating superior performance on various hallucination benchmarks....

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

Published at 2025-07-30

#ML

The paper presents a bilingual benchmark dataset of 1,079 instances in English and Chinese to evaluate the effectiveness of Spoken Dialogue Models (SDMs) in understanding and mimicking human conversations. The dataset, along with an LLM-based evaluation method, helps explore the performance of SDMs in handling challenges like ambiguity and context-dependency in spoken dialogue....

RecGPT Technical Report

Published at 2025-07-30

#ML

The study presents RecGPT, a new recommender system framework that focuses on user intent instead of just historical preferences. By integrating large language models and a multi-stage training process, RecGPT improves user experience, merchant exposure, and platform conversions, as demonstrated in a real-world deployment on the Taobao App....

Beyond Linear Bottlenecks: Spline-Based Knowledge Distillation for Culturally Diverse Art Style Classification

Published at 2025-07-31

#ML

The authors improve a self-supervised framework for art style classification by replacing its linear projection layers with Kolmogorov-Arnold Networks (KANs). This enhancement allows for better modeling of complex style features and global context, resulting in improved accuracy on art style classification datasets compared to the original framework....

Efficient Machine Unlearning via Influence Approximation

Published at 2025-07-31

#ML

This research explores machine unlearning, a method for models to forget specific training data, and introduces a new algorithm called Influence Approximation Unlearning (IAU). IAU efficiently handles data deletion requests by connecting unlearning to incremental learning, offering a better balance of removal guarantee, unlearning efficiency, and model utility compared to existing methods....

Enhanced Arabic Text Retrieval with Attentive Relevance Scoring

Published at 2025-07-31

#ML

This research develops an improved system for finding relevant Arabic text passages, focusing on a new method called Attentive Relevance Scoring (ARS). ARS enhances the traditional technique by using an adaptive scoring function that better understands the meaning of questions and passages, leading to more accurate results. The study also utilizes pre-trained Arabic language models and optimizations to boost retrieval performance and ranking accuracy for Arabic queries....

NeRF Is a Valuable Assistant for 3D Gaussian Splatting

Published at 2025-07-31

#ML

The study presents a new method called NeRF-GS that combines Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) to improve 3D scene representation. By sharing spatial information and optimizing both representations together, NeRF-GS outperforms existing methods and demonstrates that NeRF and 3DGS can work well together....

On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Published at 2025-07-31

#ML

This study shows that linear attention is a simplified version of softmax attention by presenting the recurrent form of softmax attention, which helps explain its higher expressiveness compared to other attention mechanisms. By understanding the components of softmax attention in the context of recurrent neural networks, the research provides insight into why linear attention typically underperforms in terms of accuracy....

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Published at 2025-07-31

#ML

The study focuses on improving the accuracy of GUI grounding models, which are crucial for Computer Use Agents to perform actions like clicking and typing. The researchers developed the Phi-Ground model family, achieving state-of-the-art performance on five grounding benchmarks and outperforming existing models in both agent and end-to-end settings....

Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents

Published at 2025-07-31

#ML

The study explores how Reinforcement Learning (RL) can improve the spatial reasoning and interaction abilities of visuomotor agents in 3D environments, specifically in Minecraft. They propose automated task synthesis and a distributed RL framework for large-scale multi-task training, resulting in a 4x boost in interaction success rates and zero-shot generalization across diverse environments....

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Published at 2025-07-31

#ML

The authors present Seed-Prover, a model that uses a formal verification language called Lean to improve its theorem proving abilities through iterative refinement and self-summarization. Seed-Prover, along with a geometry reasoning engine, achieves high success rates in solving advanced mathematical problems and even fully proves 5 out of 6 IMO 2025 problems, marking a significant advancement in automated mathematical reasoning....

iLRM: An Iterative Large 3D Reconstruction Model

Published at 2025-07-31

#ML

The study presents a new method called iLRM for creating 3D models quickly and efficiently. This model improves upon existing techniques by reducing computational costs and enhancing the quality of 3D reconstructions, especially when using many input images or high-resolution images....

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Published at 2025-07-31

#ML

This study presents a new framework called villa-X that enhances the learning of abstract representations of visual changes, known as latent actions, in models that help robots follow language instructions. The proposed framework significantly improves the performance of robot manipulation policies in both simulated and real-world environments compared to existing methods....

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages