🤗 Daily Paper Newsletter |
 |
Hope you found some gems! |
This newsletter delivers you the curated list of papers by 🤗 Daily Papers. |
|
|
|
|
|
|
|
|
![]() |
Dynamic Reflections: Probing Video Representations with Text Alignment |
Published at 2025-11-04 |
|
#ML
|
This study explores the alignment of video and text representations, finding that it heavily depends on the richness of visual and text data, especially with advanced video encoders. The research also suggests a link between strong alignment and general-purpose video understanding, and provides a new way to test vision and language models by correlating temporal reasoning with cross-modal alignment.... |
Read More |
|
|
|
![]() |
A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain |
Published at 2025-11-10 |
|
#ML
|
The proposed system is a decentralized retrieval augmented generation platform that uses a novel reliability scoring mechanism to evaluate data sources. This mechanism is managed through blockchain-based smart contracts, ensuring transparency and trust, and resulting in cost savings and improved performance compared to centralized systems.... |
Read More |
|
|
|
|
![]() |
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning |
Published at 2025-11-10 |
|
#ML
|
The authors present a new reranking method called Groupwise, which avoids the limitations of current reranking methods by considering the relative importance of documents within a group, allowing for more accurate results. They also introduce a novel data generation pipeline to address the issue of limited labeled data, which can be used to train both the reranker and the retriever, with successful results on two reasoning intensive retrieval benchmarks.... |
Read More |
|
|
|
![]() |
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation |
Published at 2025-11-12 |
|
#ML
|
The authors present a new framework called MMaDA-Parallel to enhance thinking-aware generation, which improves cross-modal alignment and consistency between text and images, leading to better performance on complex tasks compared to existing methods.... |
Read More |
|
|
|
|
![]() |
Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models |
Published at 2025-11-12 |
|
#ML
|
The study presents a new method called Spectrum-Aware Test-Time Steering (STS) that helps Vision-Language Models (VLMs) adapt to new situations without losing performance. STS is lightweight, fast, and requires minimal changes to the model, making it a promising technique for improving VLMs in real-world applications.... |
Read More |
|
|
|
![]() |
Instella: Fully Open Language Models with Stellar Performance |
Published at 2025-11-13 |
|
#ML
|
The researchers have developed a family of fully open-source language models called Instella, trained on public data and using AMD Instinct MI300X GPUs. These models, including specialized variants for longer contexts and mathematical reasoning, offer state-of-the-art performance while promoting transparency and reproducibility in language modeling research.... |
Read More |
|
|
|
|
![]() |
MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model |
Published at 2025-11-14 |
|
#ML
|
The research presents MicroVQA++, a high-quality microscopy dataset created using three stages of data collection, graph-based filtering, and human verification. This dataset enables large language models to perform scientific reasoning on microscopy images with state-of-the-art accuracy.... |
Read More |
|
|
|
![]() |
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling |
Published at 2025-11-14 |
|
#ML
|
MiroThinker v1.0 is an open-source research agent that improves performance by scaling interactions between the model and its environment, in addition to increasing model size and context length. This approach allows the agent to perform more tool calls and engage in longer reasoning chains, resulting in better performance on various research benchmarks compared to previous open-source agents.... |
Read More |
|
|
|
|
![]() |
UFO^3: Weaving the Digital Agent Galaxy |
Published at 2025-11-14 |
|
#ML
|
UFO^3 is a system that connects various devices and platforms, enabling digital agents to work together seamlessly. This results in improved task performance, reduced latency, and better resilience in the face of device failures, as demonstrated by the system's success in a benchmark of 55 cross-device tasks.... |
Read More |
|
|
|
![]() |
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs |
Published at 2025-11-16 |
|
#ML
|
The study presents EvoSynth, a new framework for automated red teaming of Large Language Models that can autonomously invent and evolve novel attack algorithms, rather than just refining existing ones. EvoSynth's unique code-level self-correction loop allows it to iteratively improve its own attack logic, resulting in a new state-of-the-art Attack Success Rate of 85.5% against robust models and more diverse attacks compared to existing methods.... |
Read More |
|
|
|
|
![]() |
Genomic Next-Token Predictors are In-Context Learners |
Published at 2025-11-16 |
|
#ML
|
This study examines if genomic models, like language models, can learn patterns from examples within their input (in-context learning) by training on predicting nucleotides. The results show that genomic models do exhibit this learning behavior, suggesting that in-context learning is a general property of large-scale predictive modeling, not just specific to language.... |
Read More |
|
|
|
![]() |
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data |
Published at 2025-11-16 |
|
#ML
|
The Lychee family's Uni-MoE 2.0 is a fully open-source omnimodal large model that improves language-centric multimodal understanding and generation. It uses a dynamic-capacity MoE design, a progressive training strategy, and a multimodal data matching technique to achieve superior performance in various benchmarks compared to leading models.... |
Read More |
|
|
|
|
![]() |
Back to Basics: Let Denoising Generative Models Denoise |
Published at 2025-11-17 |
|
#ML
|
This research proposes a new approach for denoising diffusion models, suggesting that predicting clean data is more effective than predicting noised quantities, as it aligns with the manifold assumption that natural data lies on a low-dimensional manifold. The proposed method, called JiT, uses large-patch Transformers directly on pixels without tokenization, pre-training, or extra loss, achieving competitive results on ImageNet at high resolutions.... |
Read More |
|
|
|
![]() |
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? |
Published at 2025-11-17 |
|
#ML
|
The study presents Live-SWE-agent, a self-evolving software agent that improves itself during runtime without needing offline training or test-time scaling, achieving high solve rates on various benchmarks compared to other agents.... |
Read More |
|
|
|
|
![]() |
OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation |
Published at 2025-11-17 |
|
#ML
|
The researchers developed a new model called OlmoEarth, which efficiently handles multimodal Earth observation data by using a unique self-supervised learning approach. OlmoEarth outperformed 12 other models in various tasks and is now used as the core of a platform that provides non-profits and NGOs with advanced tools for Earth observation.... |
Read More |
|
|
|
![]() |
P1: Mastering Physics Olympiads with Reinforcement Learning |
Published at 2025-11-17 |
|
#ML
|
Researchers have developed a family of open-source physics reasoning models called P1, which excel at solving Olympiad-level physics problems using reinforcement learning. The most notable model, P1-235B-A22B, achieved Gold-medal performance at the International Physics Olympiad and won 12 gold medals out of 13 international/regional physics competitions in 2024/2025.... |
Read More |
|
|
|
|
![]() |
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model |
Published at 2025-11-17 |
|
#ML
|
The researchers have developed a new model called Part-X-MLLM that can understand and generate 3D objects based on natural language prompts and RGB point clouds. This model separates the planning and execution of creating 3D objects, allowing for better control and performance in tasks like question answering, generation, and editing.... |
Read More |
|
|
|
![]() |
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image |
Published at 2025-11-17 |
|
#ML
|
The authors present a new method called PhysX-Anything that creates simulation-ready 3D models from a single image, focusing on physical and articulation properties for better use in AI applications. They introduce a new model and 3D representation that allows for more detailed and diverse 3D generation, and their method has been shown to work well in various experiments and simulations.... |
Read More |
|
|
|
|
![]() |
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance |
Published at 2025-11-17 |
|
#ML
|
The authors present a new method called SoCE for improving Large Language Models' performance through model souping, which involves averaging weights from multiple models. SoCE identifies expert models for each category cluster and combines them using optimized weighted averaging, resulting in improved performance and robustness across multiple domains and state-of-the-art results on the Berkeley Function Calling Leaderboard.... |
Read More |
|
|
|
![]() |
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models |
Published at 2025-11-17 |
|
#ML
|
The study presents TiViBench, a new benchmark for evaluating reasoning abilities in image-to-video generation models across four dimensions, and introduces VideoTPO, a strategy to improve reasoning performance without extra training or data.... |
Read More |
|
|
|
|
![]() |
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance |
Published at 2025-11-17 |
|
#ML
|
WebCoach is a framework that helps web agents remember and learn from their past experiences across different sessions, improving their long-term performance in complex tasks without needing to be retrained. This system allows smaller models to perform as well as a much larger GPT-4 model, making web agents more efficient and robust.... |
Read More |
|
|
|
|
|
Tags are generated by Google's Gemini Pro API, and the summary and translation are generated by Upstage's SOLAR mini chat model derived from SOLAR-10.7B open LLM.
(Experimental) The full paper is translated in korean with enko-t5-small-v0 model developed by Kim Kihyun. |
Visit Developer's Social Media |
|
|
|
|
|
|