Accepted papers
===============
Title: Discovering group dynamics in coordinated time series via hierarchical recurrent switching-state models
Authors: Michael Wojnowicz, Kaitlin Gili, Preetish Rath, Eric Miller, Jeffrey W. Miller, Clifford Lee Hancock, Meghan O'Donovan, Seth Elkin-Frankston, Tad Brunye, Michael C Hughes
Abstract: We seek a computationally efficient model for a collection of time series arising from multiple interacting entities (a.k.a. "agents"). Recent models of temporal patterns across individuals fail to incorporate explicit system-level collective behavior that can influence the trajectories of individual entities. To address this gap in the literature, we present a new hierarchical switching-state model that can be trained in an unsupervised fashion to simultaneously learn both system-level and individual-level dynamics. We employ a latent system-level discrete state Markov chain that provides top-down influence on latent entity-level chains which in turn govern the emission of each observed time series. Recurrent feedback from the observations to the latent chains at both entity and system levels allows recent situational context to inform how dynamics unfold at all levels in bottom-up fashion. We hypothesize that including both top-down and bottom-up influences on group dynamics will improve interpretability of the learned dynamics and reduce error when forecasting. Our hierarchical switching recurrent dynamical model can be learned via closed-form variational coordinate ascent updates to all latent chains that scale linearly in the number of entities. This is asymptotically no more costly than fitting a separate model for each entity. Analysis of both synthetic data and real basketball team movements suggests our lean parametric model can achieve competitive forecasts compared to larger neural network models that require far more computational resources. Further experiments on soldier data as well as a synthetic task with 64 cooperating entities show how our approach can yield interpretable insights about team dynamics over time.
URL:
https://openreview.net/forum?id=LHchZthcOf
---
Title: Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
Authors: Vaibhav Seth, Ayan Sengupta, Arinjay Pathak, Aastha A K Verma, Natraj Raman, Sriram Gopalakrishnan, Niladri Chatterjee, Tanmoy Chakraborty
Abstract: Large Language Models (LLMs) are highly resource-intensive to fine-tune due to their enormous size. While low-rank adaptation is a prominent parameter-efficient fine-tuning approach, it suffers from sensitivity to hyperparameter choices, leading to instability in model performance on fine-tuning downstream tasks. This paper highlights the importance of effective parameterization in low-rank fine-tuning to reduce estimator variance and enhance the stability of final model outputs. We propose MonteCLoRA, an efficient fine-tuning technique that employs Monte Carlo estimation to learn an unbiased posterior estimation of low-rank parameters with low expected variance, stabilizing fine-tuned LLMs with only $\mathcal{O}(r)$ additional parameters, for a given rank $r$. MonteCLoRA shows significant improvements in accuracy and robustness, achieving up to $3.8$% higher accuracy and $8.6$% greater robustness than existing efficient fine-tuning methods on natural language understanding tasks with pre-trained RoBERTa-base. Furthermore, in generative tasks with pre-trained LLaMA-1-7B and LLaMA-3.2-3B-Instruct, MonteCLoRA demonstrates robust performance with $50\%$ and $62\%$ lower spreads, respectively, than the contemporary, efficient fine-tuning methods. The theoretical and empirical results presented in the paper underscore how parameterization and hyperpriors balance exploration-exploitation in the low-rank parametric space, therefore leading to more optimal and robust parameter estimation during efficient fine-tuning.
URL:
https://openreview.net/forum?id=2HFmicB8kh
---
Title: Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Authors: Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Hanjie Chen, Xia Hu
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to lengthy and redundant outputs, known as the ''overthinking phenomenon''.
Efficient Reasoning, which seeks to optimize reasoning length while preserving reasoning capabilities, offers practical benefits such as faster processing times, lower energy consumption, and improved responsiveness, especially valuable for reasoning-intensive applications. Despite its potential, efficient reasoning remains in the early stages of research.
In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking.
URL:
https://openreview.net/forum?id=HvoG8SxggZ
---
New submissions
===============
Title: Inherently Robust Control through Maximum-Entropy Learning-Based Rollout
Abstract: Reinforcement Learning has recently proven extremely successful in the context of robot control. One of the major reasons is massively parallel simulation in conjunction with controlling for the so-called ``sim to real'' gap: training on a distribution of environments, which is assumed to contain the real one, is sufficient for finding neural policies that successfully transfer from computer simulations to real robots. Often, this is accompanied by a layer of system identification during deployment to close the gap further. Still, the efficacy of these approaches hinges on reasonable simulation capabilities with an adequately rich task distribution containing the real environment. This work aims to provide a complementary solution in cases where the aforementioned criteria may prove challenging to satisfy. We combine two approaches, $\textit{maximum-entropy reinforcement learning}$ (MaxEntRL) and $\textit{rollout}$, into an inherently robust control method called $\textbf{Maximum-Entropy Learning-Based Rollout (MELRO)}$. Both promise increased robustness and adaptability on their own. While MaxEntRL has been shown to be an adversarially-robust approach in disguise, rollout greatly improves over parametric models through an implicit Newton step on a model of the environment. We find that our approach works excellently in the vast majority of cases on both the Real World Reinforcement Learning (RWRL) benchmark and on our own environment perturbations of the popular DeepMind Control (DMC) suite, which move beyond simple parametric noise. We also show its success in ``sim to real'' transfer with the Franka Panda robot arm.
URL:
https://openreview.net/forum?id=Ho4XUDn21D
---
Title: An Operator Analysis Approach on Stochastic Differential Equations (SDEs)-Based Diffusion Generative Models
Abstract: Score-based generative models, grounded in SDEs, excel in producing high-quality data but suffer from slow sampling due to the extensive nonlinear computations required for iterative score function evaluations. We propose an innovative approach that integrates score-based reverse SDEs with kernel methods, leveraging the derivative reproducing property of reproducing kernel Hilbert spaces (RKHS) to efficiently approximate the eigenfunctions and eigenvalues of the Fokker-Planck operator. This enables data generation through linear combinations of eigenfunctions, transforming computationally intensive nonlinear operations into efficient linear ones, thereby significantly reducing computational overhead. Notably, our experimental results demonstrate remarkable progress: despite a slight reduction in sample diversity, the sampling time for a single image on the CIFAR-10 dataset is reduced to an impressive 0.29 seconds, marking a substantial advancement in efficiency. This work introduces novel theoretical and practical tools for generative modeling, establishing a robust foundation for real-time applications.
URL:
https://openreview.net/forum?id=Iocj6fTd6O
---
Title: Harmonic Loss Trains Interpretable AI Models
Abstract: In this paper, we introduce harmonic loss as an alternative supervisory signal for training neural networks and large language models (LLMs). Harmonic loss differs from standard cross-entropy loss by (a) replacing the usual SoftMax normalization with a scale-invariant HarMax function and (b) computing logits via Euclidean distance rather than a dot product. Harmonic loss enables improved interpretability and faster convergence, owing to its scale invariance and finite convergence point by design, which can be interpreted as a class center. We first validate the performance of harmonic models across algorithmic, vision, and language datasets. Through extensive experiments, we demonstrate that models trained with harmonic loss perform better than standard models by: (a) enhancing interpretability (i.e. geometry of representations), (b) requiring less data for generalization, and (c) reducing grokking. Moreover, we compare a GPT-2 model trained with harmonic loss to the standard GPT-2, illustrating that the harmonic model develops more interpretable representations. We hope our work will inspire future research exploring various methods to improve the geometry of representations, paving the way toward building more interpretable AI models.
URL:
https://openreview.net/forum?id=ZpSZ7pNoCs
---
Title: Convergence Aspects of Hybrid Kernel SVGD
Abstract: Stein variational gradient descent (SVGD) is a particle based approximate inference algorithm. Many variants of SVGD have been proposed in recent years, including the hybrid kernel variant (h-SVGD), which has demonstrated promising results on image classification with deep neural network ensembles. By framing h-SVGD as a kernelised Wasserstein gradient flow on a functional that is not the Kullback-Leibler divergence, we demonstrate that h-SVGD does not converge to the target distribution in the mean field limit. Despite this theoretical result, we provide intuition and experimental support for the ability of h-SVGD to improve variance estimation in high dimensions. Unlike other SVGD variants that also alleviate variance collapse, this is achieved at no additional computational cost and without further assumptions on the posterior.
URL:
https://openreview.net/forum?id=JZkbMSQDmD
---
Title: Encoder-only Next Token Prediction
Abstract: Next-token prediction is conventionally done using decoder-only Transformers with causal attention, as this approach allows for efficient reuse of keys and values. What if we were not compute-limited, should we still use decoder-only Transformers? In this work, we introduce Encoder-only Next Token Prediction (ENTP). We explore the differences between ENTP and decoder-only Transformers in expressive power and complexity, highlighting potential advantages of ENTP in settings with unbounded compute. We introduce the $\operatorname{Count3}$ task and show, both theoretically and experimentally, that while ENTP can perform this task easily, a decoder-only Transformer cannot. Finally, we empirically demonstrate the superior performance of ENTP across representative tasks where next-token prediction based Transformers can be evaluated, including addition, in-context learning, and language modeling.
URL:
https://openreview.net/forum?id=CGHi289y8e
---
Title: MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis
Abstract: Recent advances in artificial intelligence (AI) have precipitated significant breakthroughs in healthcare, particularly in the refinement of diagnostic procedures. However, previous studies have often been limited to limited functionalities. This study introduces MiniGPT-Med, a vision-language model derived from large-scale language models and tailored for medical applications. MiniGPT-Med demonstrates remarkable versatility across various imaging modalities, including X-rays, CT scans, and MRIs, enhancing its utility. The model is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. Its integrated processing of both image and textual clinical data markedly improves diagnostic accuracy. Our empirical assessments confirm the superior performance of MiniGPT-Med in disease detection, medical report generation, and VQA benchmarks, representing a significant step towards reducing the gap in assisting radiology practice. Furthermore, it achieves state-of-the-art performance on medical report generation, higher than the previous best model by 19\% accuracy. MiniGPT-Med promises to become a general interface for radiology diagnoses, enhancing diagnostic efficiency across a wide range of medical imaging applications.
URL:
https://openreview.net/forum?id=NenHFEg1Di
---
Title: Uncertainty-Aware Transformers: Conformal Prediction for Language Models
Abstract: Transformers have had a profound impact on the field of artificial intelligence, especially on large language models and their variants. Unfortunately, as was the case historically with neural networks, the black-box nature of transformer architectures presents significant challenges to interpretability and trustworthiness. These challenges generally emerge in high-stakes domains, such as healthcare, robotics, and finance, where incorrect predictions can have significant negative consequences, such as misdiagnosis or failed investments. For models to be genuinely useful and trustworthy in critical applications, they must provide more than just predictions: they must supply users with a clear understanding of the reasoning that underpins their decisions. This paper presents an uncertainty quantification framework for transformer-based language models. This framework, called CONFIDE (CONformal prediction for FIne-tuned DEep language models), applies conformal prediction to the internal embeddings of encoder-only architectures, like BERT and RoBERTa,
based on hyperparameters, such as distance metrics and principal component analysis. CONFIDE uses either [CLS] token embeddings or flattened hidden states to construct class-conditional nonconformity scores, enabling statistically valid prediction sets with instance-level explanations. Empirically, CONFIDE improves test accuracy by up to $4.09\%$ on BERT-tiny and achieves greater correct efficiency (i.e., the expected size of the prediction set conditioned on it containing the true label) compared to prior methods, including NM2 and VanillaNN. We show that early and intermediate transformer layers often yield better-calibrated and more semantically meaningful representations for conformal prediction. In resource-constrained models and high-stakes tasks with ambiguous labels, CONFIDE offers robustness and interpretability where softmax-based uncertainty fails.
URL:
https://openreview.net/forum?id=f8CTRCgE9a
---