🤗 Daily Paper(2025-09-25)

1 view
Skip to first unread message

deep.di...@gmail.com

unread,
Sep 25, 2025, 4:06:54 PM (6 days ago) Sep 25
to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project pageicon
🤗 daily papericon

On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub

Published at 2025-09-18

#ML

Researchers analyzed 567 GitHub pull requests created by an AI coding tool and found that 83.8% were accepted, with many merged without further changes. However, human review and refinement were still needed, particularly for bug fixes and adhering to project standards....

Read Moreicon

Advancing Speech Understanding in Speech-Aware Language Models with GRPO

Published at 2025-09-21

#ML

The study presents a new method using Group Relative Policy Optimization (GRPO) to enhance Speech-Aware Large Language Models (SALLMs) for open-format speech understanding tasks, such as Spoken Question Answering and Automatic Speech Translation. By employing GRPO with BLEU as the reward signal, the proposed approach outperforms standard Supervised Fine-Tuning (SFT) and also explores the use of off-policy samples for potential further improvements....

Read Moreicon

ATLAS: Benchmarking and Adapting LLMs for Global Trade via Harmonized Tariff Code Classification

Published at 2025-09-22

#ML

The study presents a new evaluation framework for classifying international trade products under the Harmonized Tariff Schedule, using a large language model fine-tuned for this task. The proposed model, Atlas, demonstrates significant improvements in accuracy and cost-effectiveness compared to other leading models, while also ensuring data privacy for trade and compliance workflows....

Read Moreicon

SimpleFold: Folding Proteins is Simpler than You Think

Published at 2025-09-22

#ML

The authors present a new protein folding model, SimpleFold, which uses general-purpose transformer blocks instead of complex, domain-specific architectures. This model achieves competitive performance on standard folding benchmarks, demonstrates strong ensemble prediction, and is efficient for deployment and inference on consumer-level hardware....

Read Moreicon

LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines

Published at 2025-09-23

#ML

This review explores how advanced chat AI systems, called Large Language Models (LLMs), are being used in various academic fields like arts, business, science, and engineering. The analysis highlights their impressive capabilities, such as human-like conversation and task completion, while also discussing challenges and future directions for these AI models in real-world applications....

Read Moreicon

Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation

Published at 2025-09-23

#ML

The authors present Lavida-O, a powerful model that can understand and generate multimodal content, including high-resolution image synthesis and editing, outperforming other models in various tasks while being more efficient....

Read Moreicon

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

Published at 2025-09-24

#ML

The authors present EditVerse, a unified model for image and video editing and generation, which overcomes the limitation of separate architectures and data scarcity in video editing. By using a single model and a large curated dataset, EditVerse achieves superior performance in various editing tasks compared to existing models, demonstrating its versatility and emergent capabilities....

Read Moreicon

EmbeddingGemma: Powerful and Lightweight Text Representations

Published at 2025-09-24

#ML

A new text embedding model called EmbeddingGemma has been developed, which is lightweight, open-source, and based on the Gemma 3 language model family. It has achieved state-of-the-art results in various domains while maintaining a high performance-to-cost ratio, making it ideal for low-latency and high-throughput applications....

Read Moreicon

Logics-Parsing Technical Report

Published at 2025-09-24

#ML

The authors present Logics-Parsing, an end-to-end model that uses Large Vision-Language models and reinforcement learning to improve parsing of complex document types like multi-column newspapers and posters. They also create LogicsParsingBench, a dataset for evaluating their approach, and demonstrate its effectiveness through comprehensive experiments....

Read Moreicon

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Published at 2025-09-24

#ML

The researchers have created a new system called PhysCtrl that makes videos more realistic by adding physical rules and control, like how objects move and interact in the real world. They trained it using a lot of computer-generated animations and special techniques to make sure the videos look real and follow physical laws....

Read Moreicon

SIM-CoT: Supervised Implicit Chain-of-Thought

Published at 2025-09-24

#ML

The study identifies and solves a stability issue in implicit Chain-of-Thought (CoT) methods, which are more efficient than explicit CoT reasoning in Large Language Models (LLMs). The proposed SIM-CoT method introduces step-level supervision during training to create distinct and meaningful latent states, improving both accuracy and stability for various implicit CoT methods, and even outperforming explicit CoT on some models....

Read Moreicon

Video models are zero-shot learners and reasoners

Published at 2025-09-24

#ML

This study shows that video models, like Large Language Models, can perform a wide range of tasks they weren't explicitly trained for, such as object segmentation, image editing, and visual reasoning. The findings suggest that video models could become general-purpose vision understanding tools, similar to how language models have evolved....

Read Moreicon

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.


(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Fb X In
Reply all
Reply to author
Forward
0 new messages