Accepted papers
===============
Title: LitLLMs, LLMs for Literature Review: Are we there yet?
Authors: Shubham Agarwal, Gaurav Sahu, Abhay Puri, Issam H. Laradji, Krishnamurthy Dj Dvijotham, Jason Stanley, Laurent Charlin, Christopher Pal
Abstract: Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. This paper explores the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We decompose the task into two components: (1) Retrieving related works given a query abstract and (2) Writing a literature review based on the retrieved results. We analyze how effective LLMs are for both components. For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods while providing insights into the LLM’s decision-making process. In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We release this evaluation protocol to promote additional research and development in this regard. Our empirical results suggest that LLMs show promising potential for writing literature reviews when the task is decomposed into smaller components of retrieval and planning. Particularly, we find that combining keyword-based and document-embedding-based search improves precision and recall during retrieval by 10% and 30%, respectively, compared to using either of the methods in isolation. Further, we demonstrate that our planning-based approach achieves higher-quality reviews by minimizing hallucinated references in the generated review by 18-26% compared to existing simpler LLM-based generation methods. Our project page including a demonstration system and toolkit can be accessed here: https://litllm.github.io.
URL: https://openreview.net/forum?id=heeJqQXKg7
---
Title: DeformTime: capturing variable dependencies with deformable attention for time series forecasting
Authors: Yuxuan Shu, Vasileios Lampos
Abstract: In multivariable time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and often overlook the potential of using exogenous variables in enhancing the prediction of the target endogenous variable. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. It deploys two core operations performed by deformable attention blocks (DABs): learning dependencies across variables from different time steps (variable DAB), and preserving temporal dependencies in data from previous time steps (temporal DAB). Input data transformation is explicitly designed to enhance learning from the deformed series of information while passing through a DAB. We conduct extensive experiments on 6 MTS data sets, using previously established benchmarks as well as challenging infectious disease modelling tasks with more exogenous variables. The results demonstrate that DeformTime improves accuracy against previous competitive methods across the vast majority of MTS forecasting tasks, reducing the mean absolute error by 7.2% on average. Notably, performance gains remain consistent across longer forecasting horizons.
URL: https://openreview.net/forum?id=M62P7iOT7d
---
Title: Latent Covariate Shift: Unlocking Partial Identifiability for Multi-Source Domain Adaptation
Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi
Abstract: Multi-source domain adaptation (MSDA) addresses the challenge of learning a label prediction function for an unlabeled target domain by leveraging both the labeled data from multiple source domains and the unlabeled data from the target domain. Conventional MSDA approaches often rely on covariate shift or conditional shift paradigms, which assume a consistent label distribution across domains. However, this assumption proves limiting in practical scenarios where label distributions do vary across domains, diminishing its applicability in real-world settings. For example, animals from different regions exhibit diverse characteristics due to varying diets and genetics.
Motivated by this, we propose a novel paradigm called latent covariate shift (LCS), which introduces significantly greater variability and adaptability across domains. Notably, it provides a theoretical assurance for recovering the latent cause of the label variable, which we refer to as the latent content variable. Within this new paradigm, we present an intricate causal generative model by introducing latent noises across domains, along with a latent content variable and a latent style variable to achieve more nuanced rendering of observational data. We demonstrate that the latent content variable can be identified up to block identifiability due to its versatile yet distinct causal structure. We anchor our theoretical insights into a novel MSDA method, which learns the label distribution conditioned on the identifiable latent content variable, thereby accommodating more substantial distribution shifts. The proposed approach showcases exceptional performance and efficacy on both simulated and real-world datasets.
URL: https://openreview.net/forum?id=9kFlOyLwyf
---
New submissions
===============
Title: Alternators For Sequence Modeling
Abstract: This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively. The parameters of the OTN and the FTN are not time-dependent and are learned via a minimum cross-entropy criterion over the trajectories. Alternators are versatile. They can be used as dynamical latent-variable generative models or as sequence-to-sequence predictors. Alternators can uncover the latent dynamics underlying complex sequential data, accurately forecast and impute missing data, and sample new trajectories. We showcase the capabilities of alternators in three applications. We first used alternators to model the Lorenz equations, often used to describe chaotic behavior. We then applied alternators to Neuroscience to map brain activity to physical activity. Finally, we applied alternators to Climate Science, focusing on sea-surface temperature forecasting. In all our experiments, we found alternators are stable to train, fast to sample from, yield high-quality generated samples and latent variables, and often outperform strong baselines such as Mambas, neural ODEs, and diffusion models in the domains we studied.
URL: https://openreview.net/forum?id=Q70C1HQ0VO
---
Title: Client-only Distributed Markov Chain Monte Carlo Sampling over a Network
Abstract: We aim to sample from a target
$\exp\left(-\sum_{i=1}^n f_i(x|\mathcal{D}_i\right))$ where each client $f_i$ only has access to local data $\mathcal{D}_i$. We present a fully distributed Markov Chain Monte Carlo (MCMC) sampler that operates through client-to-client communication, eliminating the need for additional centralized servers. Unlike MCMC algorithms that rely on server-client structures, our proposed sampler is entirely distributed, enhancing security and robustness through decentralized communication.
In contrast to limited decentralized algorithms arising from Langevin dynamics, our sampler utilizes blocked Gibbs sampling on an augmented distribution. Furthermore, we establish a non-asymptotic analysis of our sampler, employing innovative techniques. This study contributes to one of the initial analyses of the non-asymptotic behavior of a fully distributed sampler arising from Gibbs sampling.
URL: https://openreview.net/forum?id=1bZ2rLfKwu
---
Title: Label Embedding via Low-Coherence Matrices
Abstract: Label embedding is a framework for multiclass classification problems where each label is represented by a distinct vector of some fixed dimension, and training involves matching model output to the vector representing the correct label. While label embedding has been successfully applied in extreme classification and zero-shot learning, and offers both computational and statistical advantages, its theoretical foundations remain poorly understood. This work presents an analysis of label embedding in the context of extreme multiclass classification, where the number of classes $C$ is very large. We present an excess risk bound that reveals a trade-off between computational and statistical efficiency, quantified via the coherence of the embedding matrix. We further show that under the Massart noise condition, the statistical penalty for label embedding vanishes with sufficiently low coherence. Our analysis supports an algorithm that is simple, scalable, and easily parallelizable, and experimental results demonstrate its effectiveness in large-scale applications.
URL: https://openreview.net/forum?id=vrcWXcr4On
---
Title: Input and Output Privacy in Cross-Silo Federated Settings: an MPC+DP Approach
Abstract: We address the problem of training a machine learning model on data held by multiple data holders in a cross-silo federated setup while ensuring privacy guarantees. Existing Federated Learning (FL) solutions with Differential Privacy (DP) or Secure Multiparty Computation (MPC) with DP are often limited to either horizontal or vertical partitioning and typically suffer from accuracy loss compared to a centralized setting. We propose an MPC-based approach for training differentially private linear models that supports any partitioning scenario and effectively combines MPC and DP. Our solution employs MPC protocols for both model training and output perturbation using Laplace-like noise. By simulating a trusted curator through MPC, our approach provides the benefits of global DP without requiring an actual trusted party. The resulting MPC+DP method achieves accuracy comparable to a centralized DP setup while maintaining privacy guarantees in a cross-silo federated setup.
URL: https://openreview.net/forum?id=bedKf80Sz2
---
Title: Efficient Knowledge Injection in LLMs via Self-Distillation
Abstract: In many practical applications, large language models (LLMs) need to acquire new knowledge not present in their pre-training data. Efficiently leveraging this knowledge usually relies on supervised fine-tuning or retrieval-augmented generation (RAG). Although RAG has emerged as the industry standard for knowledge injection, fine-tuning has not yet achieved comparable success. This paper proposes utilizing prompt distillation, a self-distillation-based method previously explored primarily for style alignment and instruction tuning, to internalize new factual knowledge from free-form documents. Unlike prior methods, our approach requires neither larger teacher models nor structured knowledge formats. Across multiple LLM sizes and model families, we show that prompt distillation outperforms standard supervised fine-tuning and can even surpass RAG. We analyze the key factors contributing to prompt distillation's effectiveness and examine how it scales.
URL: https://openreview.net/forum?id=drYpdSnRJk
---
Title: The Diffusion Process as a Correlation Machine: Linear Denoising Insights
Abstract: Recently, diffusion models have gained popularity due to their impressive generative abilities. These models learn the implicit distribution given by a training dataset, and sample new data by transforming random noise through the reverse process, which can be thought of as gradual denoising. In this work, to shed more light on the evolution of denoisers in the reverse process, we examine the generation process as a ``correlation machine'', where random noise is repeatedly enhanced in correlation with the implicit given distribution.
To this end, we explore the linear case, where the optimal denoiser in the MSE sense is known to be the PCA projection. This enables us to connect the theory of diffusion models to the spiked covariance model, where the dependence of the denoiser on the noise level and the amount of training data can be expressed analytically, in the rank-1 case.
In a series of numerical experiments, we extend this result to general low rank data, and show that low frequencies emerge earlier in the generation process, where the denoising basis vectors are more aligned to the true data with a rate depending on their eigenvalues. This model allows us to show that the linear reverse process is a generalization of the prevalent power iteration method, where the generated distribution is composed of several estimations of the given covariance, in varying stages of convergence.
Finally, we empirically demonstrate the applicability of our findings beyond the linear case, in the Jacobians of a deep, non-linear denoiser, used in general image generation tasks.
URL: https://openreview.net/forum?id=FGDJOc27rt
---
Title: Cumulative Reasoning with Large Language Models
Abstract: Recent advancements in large language models (LLMs) have shown remarkable progress, yet their ability to solve complex problems remains limited. In this work, we introduce Cumulative Reasoning (CR), an approach that utilizes LLMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR’s advantage through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over previous methods. In solving MATH problems, CR achieves a 4.2% increase from previous methods and a 43% relative improvement in the most challenging level 5 problems. When incorporating a code environment with CR, we further harness LLMs’ reasoning capabilities and outperform the Program of Thought (PoT) method by 38.8%.
URL: https://openreview.net/forum?id=grW15p4eq2
---
Title: Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
Abstract: Representation Engineering (RepE) is a novel paradigm for controlling the behavior of LLMs. Unlike traditional approaches that modify inputs or fine-tune the model, RepE directly manipulates the model's internal representations. As a result, it may offer more effective, interpretable, data-efficient, and flexible control over models' behavior. We present the first comprehensive survey of RepE for LLMs, reviewing the rapidly growing literature to address key questions: What RepE methods exist and how do they differ? For what concepts and problems has RepE been applied? What are the strengths and weaknesses of RepE compared to other methods? To answer these, we propose a unified framework describing RepE as a pipeline comprising representation identification, operationalization, and control. We posit that while RepE methods offer significant potential, challenges remain, including managing multiple concepts, ensuring reliability, and preserving models' performance. Towards improving RepE, we identify opportunities for experimental and methodological improvements and construct a guide for best practices.
URL: https://openreview.net/forum?id=2U1KIfmaU9
---
Title: Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
Abstract: Spurious correlations are unstable statistical associations that hinder robust decision-making. Conventional wisdom suggests that models relying on such correlations will fail to generalize out-of-distribution (OOD), particularly under strong distribution shifts. However, a growing body of empirical evidence challenges this view, as naive empirical risk minimizers often achieve the best OOD accuracy across popular OOD generalization benchmarks. In light of these counterintuitive results, we propose a different perspective: many widely used benchmarks for assessing the impact of spurious correlations on OOD generalization are misspecified. Specifically, they fail to include shifts in spurious correlations that meaningfully degrade OOD generalization, making them unsuitable for evaluating the benefits of removing such correlations. We establish sufficient—and in some cases necessary—conditions under which a distribution shift can reliably assess a model's reliance on spurious correlations. Crucially, under these conditions, we provably should not observe a strong positive correlation between in-distribution and out-of-distribution accuracy—often referred to as accuracy on the line. Yet, when we examine state-of-the-art OOD generalization benchmarks, we find that most exhibit accuracy on the line, suggesting they do not effectively assess robustness to spurious correlations. Our findings expose a limitation in evaluating algorithms for domain generalization, i.e., learning predictors that do not rely on spurious correlations. Our results highlight the need to rethink how we assess robustness to spurious correlations.
URL: https://openreview.net/forum?id=fNywRyqPQo
---
Title: Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance
Abstract: The ensemble average of physical properties of molecules is closely related to the distribution of molecular conformations, and sampling such distributions is a fundamental challenge in physics and chemistry. Traditional methods like molecular dynamics (MD) simulations and Markov chain Monte Carlo (MCMC) sampling are commonly used but can be time-consuming and costly. Recently, diffusion models have emerged as efficient alternatives by learning the distribution of training data. Obtaining an unbiased target distribution is still an expensive task, primarily because it requires satisfying ergodicity. To tackle these challenges, we propose Potential Score Matching (PSM), an approach that utilizes the potential energy gradient to guide generative models. PSM does not require exact energy functions and can debias sample distributions even when trained on limited and biased data. Our method outperforms existing state-of-the-art (SOTA) models on the Lennard-Jones (LJ) potential, a commonly used toy model. Furthermore, we extend the evaluation of PSM to high-dimensional problems using the MD17 and MD22 datasets. The results demonstrate that molecular distributions generated by PSM more closely approximate the Boltzmann distribution compared to traditional diffusion models.
URL: https://openreview.net/forum?id=tTdzbnvTno
---
Title: Personalized Federated Learning via Low-Rank Matrix Optimization
Abstract: Personalized Federated Learning (pFL) has gained significant attention for building a suite of models tailored to different clients. In pFL, the challenge lies in balancing the reliance on local datasets, which may lack representativeness, against the diversity of other clients' models, whose quality and relevance are uncertain. Focusing on the clustered FL scenario, where devices are grouped based on similarities in their data distributions without prior knowledge of cluster memberships, we develop a mathematical model for pFL using low-rank matrix optimization. Building on this formulation, we propose a pFL approach leveraging the Burer-Monteiro factorization technique. We examine the convergence guarantees of the proposed method, and present numerical experiments on training deep neural networks, demonstrating the empirical performance of the proposed method in scenarios where personalization is crucial.
URL: https://openreview.net/forum?id=DFJu1QB2Nr
---