🤗 Daily Paper(2025-11-14)

1 view

Skip to first unread message

deep.di...@gmail.com

unread,

Nov 14, 2025, 3:07:23 PM (4 days ago) Nov 14

to hf-daily-pap...@googlegroups.com

🤗 Daily Paper Newsletter

Hope you found some gems!

This newsletter delivers you the curated list of papers by 🤗 Daily Papers.

project page

🤗 daily paper

Superpositional Gradient Descent: Harnessing Quantum Principles for Model Training

Published at 2025-11-01

#ML

The researchers have developed a new optimizer called Superpositional Gradient Descent (SGD) that uses quantum circuit perturbations to improve the convergence and performance of large language models compared to the classical AdamW optimizer. However, practical adoption of SGD is currently limited by scalability and hardware constraints....

CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment Analysis

Published at 2025-11-10

#ML

The authors created a new dataset called CC30k, which contains 30,000 citation contexts from machine learning papers, labeled with their perceived reproducibility. This dataset can help improve models that predict the reproducibility of research papers and is available for public use....

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

Published at 2025-11-10

#ML

The authors present ResearchRubrics, a new benchmark for evaluating Deep Research agents that use large language models for open-ended queries. This benchmark, created with extensive human effort, offers realistic prompts and expert-written rubrics to assess the agents' factual accuracy, reasoning, and clarity, aiming to improve the evaluation of these complex systems....

AlphaResearch: Accelerating New Algorithm Discovery with Language Models

Published at 2025-11-11

#ML

The authors propose AlphaResearch, an agent that discovers new algorithms on open-ended problems by combining an execution-based verification environment with a simulated peer review process. AlphaResearch competes with human researchers and achieves a 2/8 win rate, with its solution to the 'packing circles' problem outperforming previous results....

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Published at 2025-11-11

#ML

The authors present a new open-source framework called UniVA that combines various video processing tasks, such as understanding, segmentation, editing, and generation, into a single, automated system. UniVA uses a dual-agent architecture to interpret user intentions and execute complex workflows, enabling interactive and self-reflective video creation with full traceability, and is accompanied by a benchmark suite to evaluate such systems....

Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Published at 2025-11-12

#ML

This study reveals a new method to compromise decentralized GRPO systems, where attackers can manipulate language models by inserting harmful tokens, leading to successful attacks in a short time. The researchers also suggest two defense strategies that can entirely prevent these attacks, whether users train the same or different models....

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

Published at 2025-11-12

#ML

PAN is a new world model that uses text-based knowledge and natural language actions to simulate future events in a realistic and interactive way, allowing for long-term prediction and reasoning about the world....

SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control

Published at 2025-11-12

#ML

The study presents SliderEdit, a new framework for precise and continuous control in image editing. It allows users to adjust the intensity of individual edits smoothly, unlike existing models, and improves edit controllability, visual consistency, and user steerability in state-of-the-art image editing models....

Solving a Million-Step LLM Task with Zero Errors

Published at 2025-11-12

#ML

The authors present MAKER, a system that can solve complex tasks with over a million steps using large language models (LLMs) without making any mistakes. This is achieved by breaking down tasks into smaller subtasks, each handled by specialized agents, and using a voting system to correct errors, enabling efficient problem-solving at an organizational and societal level....

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

Published at 2025-11-13

#ML

The authors propose a new task called Fine-grained 3D Embodied Reasoning, which requires an agent to predict the spatial location, motion type, and motion axis of each referenced affordance element in a 3D scene based on a task instruction. They introduce AffordBot, a framework that uses Multimodal Large Language Models and a specialized reasoning process to solve this task, achieving state-of-the-art performance on the SceneFun3D dataset....

Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation

Published at 2025-11-13

#ML

This research presents a framework to evaluate diversity in text-to-image models by focusing on individual concepts and their factors of variation. The framework includes a novel evaluation template, a curated prompt set, and a methodology for comparing models based on human annotations, which helps identify areas where these models struggle with diversity....

Black-Box On-Policy Distillation of Large Language Models

Published at 2025-11-13

#ML

This research presents a new method called Generative Adversarial Distillation (GAD) that trains a student large language model (LLM) by making it compete in a game against a discriminator model, which is trained to distinguish the student's responses from those of a proprietary teacher model. Experiments show that GAD outperforms traditional distillation methods and results in a student model that is nearly as good as the teacher model....

Depth Anything 3: Recovering the Visual Space from Any Views

Published at 2025-11-13

#ML

The study introduces a new model, DA3, which predicts consistent spatial geometry from various visual inputs without the need for specialized architecture or complex multi-task learning. DA3 outperforms previous models in camera pose and geometric accuracy, setting a new benchmark in visual geometry tasks....

MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples

Published at 2025-11-13

#ML

This study presents a new framework called MuSc-V2 for identifying and outlining defects in industrial products without using labeled samples. By leveraging the unique property that normal image patches are similar to each other, while anomalies are not, MuSc-V2 improves 3D representation, fuses 2D/3D neighborhood cues, and uses a mutual scoring mechanism to assign scores to samples within each modality. The framework achieves significant performance improvements on two datasets, outperforming p...

Music Flamingo: Scaling Music Understanding in Audio Language Models

Published at 2025-11-13

#ML

The researchers created a new audio-language model called Music Flamingo that can understand music better than existing models by training it on a large, detailed dataset. They improved the model's reasoning skills and achieved top results in various music understanding and reasoning tests, setting a new standard for models that can perceive songs in a human-like manner....

One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

Published at 2025-11-13

#ML

This research proposes a new method called Latent Upscaler Adapter (LUA) to improve the resolution of images generated by diffusion models. LUA works by enhancing the model's latent code before the final decoding step, allowing for high-resolution images without the need for additional training or complex modifications, and demonstrates faster processing and comparable quality to existing methods....

Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

Published at 2025-11-13

#ML

This study presents AdvancedIF, a new benchmark for evaluating large language models' ability to follow complex instructions, and RIFL, a training method that uses the benchmark to improve models' instruction-following skills. Results show significant improvements in models' performance, highlighting the potential of rubrics in creating more capable and reliable AI systems....

Published at

Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.

(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun.

Visit Developer's Social Media

Reply all

Reply to author

Forward

0 new messages