Survey Certification: Efficient Diffusion Models: A Survey
Hui Shen, Jingxuan Zhang, Boning Xiong, Rui Hu, Shoufa Chen, Zhongwei Wan, Xin Wang, Yu Zhang, Zixuan Gong, Guangyin Bao, Chaofan Tao, Yongfeng Huang, Ye Yuan, Mi Zhang
https://openreview.net/forum?id=wHECkBOwyt
---
Accepted papers
===============
Title: Tighter sparse variational Gaussian processes
Authors: Thang D Bui, Matthew Ashman, Richard E. Turner
Abstract: Sparse variational Gaussian process (GP) approximations based on inducing points have become the de facto standard for scaling GPs to large datasets, owing to their theoretical elegance, computational efficiency, and ease of implementation. This paper introduces a provably tighter variational approximation by relaxing the standard assumption that the conditional approximate posterior given the inducing points must match that in the prior. The key innovation is to modify the conditional posterior to have smaller variances than that of the prior at the training points. We derive the collapsed bound for the regression case, describe how to use the proposed approximation in large data settings, and discuss its application to handle orthogonally structured inducing points and GP latent variable models. Extensive experiments on regression benchmarks, classification, and latent variable models demonstrate that the proposed approximation consistently matches or outperforms standard sparse variational GPs while maintaining the same computational cost.
URL: https://openreview.net/forum?id=L33DSu3zvq
---
Title: Efficient Diffusion Models: A Survey
Authors: Hui Shen, Jingxuan Zhang, Boning Xiong, Rui Hu, Shoufa Chen, Zhongwei Wan, Xin Wang, Yu Zhang, Zixuan Gong, Guangyin Bao, Chaofan Tao, Yongfeng Huang, Ye Yuan, Mi Zhang
Abstract: Diffusion models have emerged as powerful generative models capable of producing high-quality contents such as images, videos, and audio, demonstrating their potential to revolutionize digital content creation. However, these capabilities come at the cost of significant computational resources and lengthy generation time, underscoring the critical need to develop efficient techniques for practical deployment. In this survey, we provide a systematic and comprehensive review of research on efficient diffusion models. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient diffusion model topics from algorithm-level, system-level, and framework perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at github.com/AIoT-MLSys-Lab/Efficient-Diffusion-Model-Survey. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient diffusion model research and inspire them to contribute to this important and exciting field.
URL: https://openreview.net/forum?id=wHECkBOwyt
---
Title: RESTOR: Knowledge Recovery in Machine Unlearning
Authors: Keivan Rezaei, Khyathi Chandu, Soheil Feizi, Yejin Choi, Faeze Brahman, Abhilasha Ravichander
Abstract: Large language models trained on web-scale corpora can memorize undesirable data containing misinformation, copyrighted material, or private or sensitive information.
Recently, several machine unlearning algorithms have been proposed to eliminate the effect of such datapoints from trained models-- that is, to approximate *a model that had never been trained on these datapoints in the first place*.
However, evaluating the effectiveness of unlearning algorithms remains an open challenge.
Previous work has relied on heuristics-- such as verifying that the model can no longer reproduce the specific information targeted for removal while maintaining accuracy on unrelated test data.
These approaches inadequately capture the complete effect of reversing the influence of datapoints on a trained model.
In this work, we propose the RESTOR framework for machine unlearning evaluation, which assesses the ability of unlearning algorithms for targeted data erasure, by evaluating the ability of models to forget the knowledge introduced in these datapoints, while simultaneously recovering the model's knowledge state had it never encountered these datapoints.
RESTOR helps uncover several novel insights about popular unlearning algorithms,
and the mechanisms through which they operate--
for instance, identifying that some algorithms merely emphasize forgetting but not recovering knowledge,
and that localizing unlearning targets can enhance unlearning performance.
URL: https://openreview.net/forum?id=BbwlJpNXgW
---
Title: MarDini: Masked Auto-regressive Diffusion for Video Generation at Scale
Authors: Haozhe Liu, Shikun Liu, Zijian Zhou, Mengmeng Xu, Yanping Xie, Xiao Han, Juan Camilo Perez, Ding Liu, Kumara Kahatapitiya, Menglin Jia, Jui-Chieh Wu, Sen He, Tao Xiang, Jürgen Schmidhuber, Juan-Manuel Perez-Rua
Abstract: We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini’s MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.
URL: https://openreview.net/forum?id=fuOHI59rUW
---
Title: Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control
Authors: Zhuoqun Chen, Xiu Yuan, Tongzhou Mu, Hao Su
Abstract: Imitation learning is an efficient method for teaching robots a variety of tasks. Diffusion Policy, which uses a conditional denoising diffusion process to generate actions, has demonstrated superior performance, particularly in learning from multi-modal demonstrates. However, it relies on executing multiple actions predicted from the same inference step to retain performance and prevent mode bouncing, which limits its responsiveness, as actions are not conditioned on the most recent observations. To address this, we introduce Responsive Noise-Relaying Diffusion Policy (RNR-DP), which maintains a noise-relaying buffer with progressively increasing noise levels and employs a sequential denoising mechanism that generates immediate, noise-free actions at the head of the sequence, while appending noisy actions at the tail. This ensures that actions are responsive and conditioned on the latest observations, while maintaining motion consistency through the noise-relaying buffer. This design enables the handling of tasks requiring responsive control, and accelerates action generation by reusing denoising steps. Experiments on response-sensitive tasks demonstrate that, compared to Diffusion Policy, ours achieves 18% improvement in success rate. Further evaluation on regular tasks demonstrates that RNR-DP also exceeds the best acceleration method (DDIM) by 6.9% in success rate, highlighting its computational efficiency advantage in scenarios where responsiveness is less critical.
URL: https://openreview.net/forum?id=LLWJkR6gaI
---
Title: Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs
Authors: Pranav Maneriker, Aditya T. Vadlamani, Anutam Srinivasan, Yuntian He, Ali Payani, srinivasan parthasarathy
Abstract: Conformal prediction has become increasingly popular for quantifying the uncertainty associated with machine learning models. Recent work in graph uncertainty quantification has built upon this approach for conformal graph prediction. The nascent nature of these explorations has led to conflicting choices for implementations, baselines, and method evaluation. In this work, we analyze the design choices made in the literature and discuss the tradeoffs associated with existing methods. Building on the existing implementations for existing methods, we introduce techniques to scale existing methods to large-scale graph datasets without sacrificing performance. Our theoretical and empirical results justify our recommendations for future scholarship in graph conformal prediction.
URL: https://openreview.net/forum?id=Ed1DBB3sBQ
---
New submissions
===============
Title: GeMS: Efficient Gaussian Splatting for Extreme Motion Blur
Abstract: We introduce GeMS, a framework for 3D Gaussian Splatting designed to handle severely
motion-blurred images. State-of-the-art deblurring method for extreme motion blur, such as
ExBluRF, as well as Gaussian Splatting-based approaches like Deblur-GS, typically assume
access to corresponding sharp images for camera pose estimation and point cloud generation,
which is an unrealistic assumption. Additionally, methods relying on COLMAP initialization,
such as BAD-Gaussians, fail due to the lack of reliable feature correspondences in cases
of severe motion blur. To address these challenges, we propose GeMS, a 3D Gaussian Splat-
ting framework that reconstructs scenes directly from extremely motion-blurred images.
GeMS integrates: (1) VGGSfM, a deep learning-based SfM pipeline which estimates camera
poses and generates point clouds directly from severely motion-blurred images; (2) MCMC -
based Gaussian Splatting, which enables robust scene initialization by treating Gaussians as
samples from an underlying probability distribution, eliminating heuristic densification and
pruning strategies; and (3) Joint optimization of camera motion trajectory and Gaussian
parameters which ensures stable and accurate reconstruction. While this pipeline produces
reasonable reconstructions, extreme motion blur can still introduce inaccuracies, especially
when all input views are severely blurred. To address this, we propose GeMS-E, which integrates
a progressive refinement step when event data is available. Specifically, we perform
(4) Event-based Double Integral (EDI) deblurring, which first restores deblurred images from
motion-blurred inputs. These deblurred images are then fed into the GeMS framework, lead-
ing to improved pose estimation, point cloud generation, and hence overall reconstruction
quality. Both GeMS & GeMS-E achieve state-of-the-art performance on synthetic as well
as real-world datasets, demonstrating their effectiveness in handling extreme motion blur.
To the best of our knowledge, we are the first to effectively address this problem in extreme
blur scenarios within a 3D Gaussian Splatting framework, without requiring sharp images
for SfM (pose and point cloud) initialization.
URL: https://openreview.net/forum?id=BDjnnr8qGE
---
Title: CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives
Abstract: Grounding the instruction in the environment is a key step in solving language-guided goal-reaching reinforcement learning problems. In automated reinforcement learning, a key concern is to enhance the model's ability to generalize across various tasks and environments. In goal-reaching scenarios, the agent must comprehend the different parts of the instructions within the environmental context in order to complete the overall task successfully. In this work, we propose \textbf{CAREL} (\textit{\textbf{C}ross-modal \textbf{A}uxiliary \textbf{RE}inforcement \textbf{L}earning}) as a new framework to solve this problem using auxiliary loss functions inspired by video-text retrieval literature and a novel method called instruction tracking, which automatically keeps track of progress in an environment. The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems.
URL: https://openreview.net/forum?id=zJUEYr5X1X
---
Title: Long-Term Fairness Without Utility Deterioration
Abstract: In fair machine learning, the trade-off between fairness and utility has been predominantly studied in static classification settings, neglecting concerns for long-term learning environments where the population distribution may vary due to the deployment of model policies. This work investigates whether zero utility deterioration can be achieved in the long run. We introduce a Markov decision process (MDP) to formulate the interplay between model decisions and population distribution shifts. A key technical contribution is identifying a sufficient and necessary condition under which a model policy achieving long-term fairness does not compromise utility. Inspired by this condition, we propose effective reward functions that can be combined with online reinforcement learning algorithms, allowing the classifier to accommodate dynamic control objectives such as inducing population adaptations to maximize fairness without sacrificing model performance. Experiments on both synthetic and real-world datasets suggest the effectiveness of the proposed reinforcement learning framework in the long run and drive a classifier-population system toward a desirable equilibrium where the identified condition is met.
URL: https://openreview.net/forum?id=BvuSeLV1Nc
---
Title: On Representing Convex Quadratically Constrained Quadratic Programs via Graph Neural Networks
Abstract: Convex quadratically constrained quadratic programs (QCQPs) involve finding a solution within a convex feasible region defined by quadratic constraints while minimizing a convex quadratic objective function. These problems arise in various industrial applications, including power systems and signal processing. Traditional methods for solving convex QCQPs primarily rely on matrix factorization, which quickly becomes computationally prohibitive as the problem size increases. Recently, graph neural networks (GNNs) have gained attention for their potential in representing and solving various optimization problems such as linear programs and linearly constrained quadratic programs. In this work, we investigate the representation power of GNNs in the context of QCQP tasks. Specifically, we propose a new tripartite graph representation for general convex QCQPs and properly associate it with message-passing GNNs. We demonstrate that there exist GNNs capable of reliably representing key properties of convex QCQPs, including feasibility, optimal value, and optimal solution. Our result deepens the understanding of the connection between QCQPs and GNNs, paving the way for future machine learning approaches to efficiently solve QCQPs.
URL: https://openreview.net/forum?id=GC2ZO6Asoa
---
Title: A Mixture of Exemplars Approach for Efficient Out-of-Distribution Detection with Foundation Models
Abstract: One of the early weaknesses identified in deep neural networks trained for image classification tasks was their inability to provide low confidence predictions on out-of-distribution (OOD) data that was significantly different from the in-distribution (ID) data used to train them. Representation learning, where neural networks are trained in specific ways that improve their ability to detect OOD examples, has emerged as a promising solution. However, these approaches require long training times and can add additional overhead to detect OOD examples. Recent developments in Vision Transformer (ViT) foundation models---large networks trained on large and diverse datasets with self-supervised approaches---also show strong performance in OOD detection, and could address these challenges. This paper presents Mixture of Exemplars (MoLAR), an efficient approach to tackling OOD detection challenges that is designed to maximise the benefit of training a classifier with a high quality, frozen, pretrained foundation model backbone. MoLAR provides strong OOD performance when only comparing the similarity of OOD examples to the exemplars, a small set of images chosen to be representative of the dataset, leading to up to 30 times faster OOD detection inference over other methods that provide best performance when the full ID dataset is used. In some cases, only using these exemplars actually improves performance with MoLAR. Extensive experiments demonstrate the improved OOD detection performance of MoLAR in comparison to comparable approaches in both supervised and semi-supervised settings, and code is available at https://anonymous.4open.science/r/molar-mixture-of-exemplars-4872/README.md.
URL: https://openreview.net/forum?id=xpKqnSJtE4
---
Title: NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context?
Abstract: The capability of large language models to handle long-context information plays a crucial role across various real-world applications. Existing methods for evaluating long-context abilities often rely either on real-world long texts, making it difficult to exclude the influence of models' inherent knowledge, or introduce large amounts of irrelevant filler content to artificially reach target lengths, reducing the relevance and effectiveness of assessments. To address these limitations, we introduce NeedleBench, a comprehensive synthetic framework designed to assess retrieval and reasoning performance in bilingual long-context tasks with adaptive context lengths (e.g., 32k, 128k, and beyond). NeedleBench systematically embeds key data points at varying depths to rigorously test models' capabilities in diverse settings. Tasks within NeedleBench are categorized into two distinct scenarios: information-sparse, characterized by minimal relevant details embedded within extensive irrelevant text to simulate simpler real-world retrieval tasks; and information-dense, implemented as the Ancestral Trace Challenge, where relevant information is continuously distributed throughout the context to simulate more complex real-world reasoning tasks. Our experiments show that, while recent reasoning models such as Deepseek-R1 and OpenAI's o3 have demonstrated strong performance on mathematical reasoning benchmarks, they still struggle to generalize their reasoning abilities and perform poorly on our information-dense tasks, frequently encountering difficulties with continuous retrieval and reasoning even at relatively shorter context lengths.Furthermore, we identify and characterize a phenomenon termed `under-thinking', wherein models prematurely conclude their reasoning processes despite the availability of relevant information. NeedleBench thus provides critical insights and targeted evaluation tools essential for understanding and improving the long-context capabilities of LLMs. All codes and resources will be publicly available.
URL: https://openreview.net/forum?id=cEvmIKsRw0
---