Daily TMLR digest for Nov 23, 2025

0 views
Skip to first unread message

TMLR

unread,
Nov 23, 2025, 12:30:09 AM (12 days ago) Nov 23
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: Streamlining Language Models via Semantic Basis Analysis

Authors: Yang Li, Daniel Agyei Asante, Changsheng Zhao, Ernie Chang, Yangyang Shi, Vikas Chandra

Abstract: As the size of language models increases, they deliver substantial performance improvements across a variety of applications. However, this growth also leads to greater computational demands, making deployment on resource-constrained devices—such as personal computers and mobile or wearable devices—more challenging, and significantly raising inference costs on cloud servers. To address these challenges, we introduce Basel, a method to streamline language models by leveraging the semantic structure of their weight matrices. Specifically, Basel treats each weight matrix as a linear combination of bases, selectively retaining those that are associated with essential semantics for the target application, pruning redundant ones, and introducing new bases that enhance task performance. Experimental results demonstrate that Basel achieves significant model size reduction compared to baseline techniques, while maintaining comparable or even superior accuracy across diverse applications.

URL: https://openreview.net/forum?id=qq7NNAXvuv

---

Title: Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models

Authors: Zhongyu Yang, Dannong Xu, Wei Pang, Yingfang Yuan

Abstract: The rapid growth of visual tokens in multimodal large language models (MLLMs) leads to excessive memory consumption and inference latency, especially when handling high-resolution images and videos. Token pruning is a technique used to mitigate this issue by removing redundancy, but existing methods often ignore relevance to the user query or suffer from the limitations of attention mechanisms, reducing their adaptability and effectiveness. To address these challenges, we propose Script, a plug-and-play pruning method that requires no retraining and generalizes across diverse MLLMs. Script comprises two modules: a graph-structured pruning module that removes visually redundant tokens, and a query-conditioned semantic pruning module that preserves query-relevant visual information. Together, they enhance performance on multimodal tasks. Experiments on fourteen benchmarks across image and video understanding tasks show that Script consistently achieves higher model efficiency and predictive accuracy compared to existing pruning methods. On LLaVA-NeXT-7B, it achieves up to $6.8\times$ prefill speedup and $10\times$ FLOP reduction, while retaining 96.88\% of the original performance.

URL: https://openreview.net/forum?id=F6xKzbgcHq

---

Title: Learning to Prompt Your Domain for Federated Vision-Language Models

Authors: Guoyizhe Wei, Feng Wang, Anshul Shah, Rama Chellappa

Abstract: The prompt tuning paradigm, with its great advantages of low parameter count and stable training, has recently inspired numerous applications of CLIP-like vision-language models in federated learning. However, in this work, we posit that under significant domain gaps across federated participants, prompt-based CLIP may easily collapse to non-optimal solutions due to the neglect of domain-aware knowledge. We present a novel prompt tuning method, termed ADAPT, to address this issue by learning both intra- and inter-domain prompts. Specifically, we assign each federated participant a domain-specific prompt and use the image's visual features as a condition to guide the generation of language features, with the underlying idea that the prompted CLIP should detect the input image's domain correspondence before making the prediction of its category. Extensive experiments demonstrate ADAPT's significant efficiency and effectiveness in federated learning. For example, by learning and sharing only 2.1M parameters, ADAPT attains a 69.8% average accuracy over the six domains of DomainNet, which improves the original CLIP accuracy by 16.2%.

URL: https://openreview.net/forum?id=OS7zPOZjr3

---

Title: On Representing Convex Quadratically Constrained Quadratic Programs via Graph Neural Networks

Authors: Chenyang Wu, Qian Chen, Akang Wang, Tian Ding, Ruoyu Sun, Wenguo Yang, Qingjiang Shi

Abstract: Convex quadratically constrained quadratic programs (QCQPs) involve finding a solution within a convex feasible region defined by quadratic constraints while minimizing a convex quadratic objective function. These problems arise in various industrial applications, including power systems and signal processing. Traditional methods for solving convex QCQPs primarily rely on matrix factorization, which quickly becomes computationally prohibitive as the problem size increases. Recently, graph neural networks (GNNs) have gained attention for their potential in representing and solving various optimization problems such as linear programs and linearly constrained quadratic programs. In this work, we investigate the representation power of GNNs in the context of QCQP tasks. Specifically, we propose a new tripartite graph representation for general convex QCQPs and properly associate it with message-passing GNNs. We demonstrate that there exist GNNs capable of reliably representing key properties of convex QCQPs, including feasibility, optimal value, and optimal solution. Our result deepens the understanding of the connection between QCQPs and GNNs, paving the way for future machine learning approaches to efficiently solve QCQPs.

URL: https://openreview.net/forum?id=GC2ZO6Asoa

---

Title: Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective

Authors: Meng Ding, Rohan Sharma, Changyou Chen, Jinhui Xu, Kaiyi Ji

Abstract: Machine Unlearning has emerged as a significant area of research, focusing on `removing' specific subsets of data from a trained model. Fine-tuning (FT) methods have become one of the fundamental approaches for approximating unlearning, as they effectively retain model performance. However, it is consistently observed that naive FT methods struggle to forget the targeted data.
In this paper, we present the first theoretical analysis of FT methods for machine unlearning within a linear regression framework, providing a deeper exploration of this phenomenon. Our analysis reveals that while FT models can achieve zero remaining loss, they fail to forget the forgetting data, as the pretrained model retains its influence and the fine-tuning process does not adequately mitigate it. To address this, we propose a novel Retention-Based Masking (RBM) strategy that constructs a weight saliency map based on the remaining dataset, unlike existing methods that focus on the forgetting dataset. Our theoretical analysis demonstrates that RBM not only significantly improves unlearning accuracy (UA) but also ensures higher retaining accuracy (RA) by preserving overlapping features shared between the forgetting and remaining datasets. Experiments on synthetic and real-world datasets validate our theoretical insights, showing that RBM outperforms existing masking approaches in balancing UA, RA, and disparity metrics.

URL: https://openreview.net/forum?id=4hNquAmFqf

---

Title: Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction

Authors: Zheyuan Liu, Junyan Wang, Zicheng Duan, Cristian Rodriguez-Opazo, Anton van den Hengel

Abstract: Text-video prediction (TVP) is a downstream video generation task that requires a model to produce subsequent video frames given a series of initial video frames and text describing the required motion.
In practice TVP methods focus on a particular category of videos depicting manipulations of objects carried out by human beings or robot arms.
Previous methods adapt models pre-trained on text-to-image tasks, and thus tend to generate video that lacks the required continuity.
A natural progression would be to leverage more recent pre-trained text-to-video (T2V) models.
This approach is rendered more challenging by the fact that the most common fine-tuning technique, low-rank adaptation (LoRA), yields undesirable results.
In this work, we propose an adaptation-based strategy we label Frame-wise Conditioning Adaptation (FCA).
Within the module, we devise a sub-module that produces frame-wise text embeddings from the input text, which acts as an additional text condition to aid generation.
We use FCA to fine-tune the T2V model, which incorporates the initial frame(s) as an extra condition.
We compare and discuss the more effective strategy for injecting such embeddings into the T2V model.
We conduct extensive ablation studies on our design choices with quantitative and qualitative performance analysis.
Our approach establishes a new baseline for the task of TVP.
Our code is open-source at https://github.com/Cuberick-Orion/FCA.

URL: https://openreview.net/forum?id=HSAjl4LUHK

---


New submissions
===============


Title: Tube Loss: A Novel Loss Function for Prediction Interval Estimation

Abstract: This paper proposes a novel loss function called Tube Loss, developed for the simultaneous estimation of the lower and upper bounds of a Prediction Interval (PI) in regression problems, including probabilistic forecasting in autoregressive frameworks. The PIs obtained through empirical risk minimization using Tube Loss exhibit superior performance compared to those derived from existing approaches. A theoretical analysis confirms that the estimated PIs asymptotically attain a user-specified confidence level $1-\alpha$. A distinctive feature of Tube Loss is its ability to shift the PI along the support of the response distribution through a tunable parameter, allowing the intervals to better align with high-density regions of the distribution. This is especially valuable for generating tighter intervals when the response distribution is skewed. Moreover, the method allows further narrowing of PIs through recalibration. Unlike several prior techniques, the empirical risk associated with Tube Loss can be efficiently optimized via gradient descent. Extensive experiments demonstrate the robustness and accuracy of the proposed method in delivering high-quality PIs across a range of models, including kernel machines, neural networks, and probabilistic forecasting frameworks.

URL: https://openreview.net/forum?id=3vwPza62Rr

---

Title: KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Abstract: Lip synchronization, known as the task of aligning lip movements in an existing video with new input audio, is typically framed as a simpler variant of audio-driven facial animation. However, as well as suffering from the usual issues in talking head generation (e.g., temporal consistency), lip synchronization presents significant new challenges such as expression leakage from the input video and facial occlusions, which can severely impact real-world applications like automated dubbing, but are largely neglected by existing works. To address these shortcomings, we present KeySync, a two-stage framework that succeeds in mitigating the issue of temporal consistency, while also incorporating solutions for leakage and occlusions using a carefully designed masking strategy. We show that KeySync achieves state-of-the-art results in lip reconstruction and cross-synchronization, improving visual quality and reducing expression leakage according to LipLeak, our novel leakage metric. Furthermore, we demonstrate the effectiveness of our new masking approach in handling occlusions and validate our architectural choices through several ablation studies. Our code and models will be made publicly available.

URL: https://openreview.net/forum?id=dvtMHhZUyG

---

Title: Differentially Private and Scalable Estimation of the Network Principal Component

Abstract: Computing the principal component (PC) of the adjacency matrix of an undirected graph has several applications ranging from identifying key vertices for influence maximization and controlling diffusion processes, to discovering densely interconnected vertex subsets. However, many networked datasets are sensitive, which necessitates private computation of the PC for use in the aforementioned applications. Differential privacy has emerged as the gold standard in privacy-preserving data analysis, but existing DP algorithms for private PC suffer from low accuracy due to large noise injection or high complexity. Motivated by the large gap between the local and global sensitivities of the PC on real-graphs, we consider instance-specific mechanisms for privately computing the PC under edge-DP. These mechanisms guarantee privacy for all datasets, but provide good utility on ``well-behaved'' datasets by injecting smaller amounts of noise. More specifically, we consider the Propose-Test-Release (PTR) framework. Although computationally expensive in general, we design a novel approach for implementing a PTR variant in the same time as computation of a non-private PC, while offering good utility.
Our framework tests in a differentially-private manner whether a given graph is ``well-behaved'' or not, and then tests whether its private to release a noisy PC with small noise.
As a consequence, this also leads to the first DP algorithm for the Densest-$k$-subgraph problem, a key graph mining primitive.
We run our method on diverse real-world networks, with the largest having 3 million vertices, and compare its utility to a pre-existing baseline based on the private power method (PPM).
Although PTR requires a slightly larger privacy budget, on average, it achieves a 180-fold improvement in runtime over PPM.

URL: https://openreview.net/forum?id=V0BjWbrAYC

---

Title: Ternary Momentum For Quantized Training

Abstract: Quantization enables efficient inference on resource-limited devices, yet training still depends on high-precision gradients and optimizer states.
We address this gap by introducing stochastic ternary momentum, a fully quantized optimizer that operates with quantized parameters, ternary gradient information, and enables ternary momentum states for stable and memory efficient quantized optimization.
Our method replaces deterministic and full-precision updates with integer-valued updates driven by stochastic sampling, ensuring that expected updates match standard momentum while maintaining strict memory constraints.
It eliminates re-quantization overhead and preserves quantization consistency throughout training.
We establish theoretical convergence guarantees of our ternary momentum method for convex objectives over bounded integer domains and for non-convex objectives over unbounded integer domains. Experiments on vision and language tasks demonstrate that our approach retains strong performance while reducing optimizer memory by 95\% compared to full-precision, advancing the feasibility of fully quantized training.

URL: https://openreview.net/forum?id=A3mVmPlahU

---

Title: Hierarchically Metric-Structured Knowledge Graph Embeddings

Abstract: In the vast landscape of big data, there is an important challenge in understanding data and structuring it in a suitable format. Knowledge graphs are considered a sophisticated solution to organize and infer data and knowledge, offering a structured framework that transcends disciplinary boundaries in medicine, culture, biology, social networks, music, and beyond. Despite their informativeness, these systems are typically incomplete and their intrinsic structure unknown, whereas existing methodologies for predicting missing facts and characterizing their structure face scalability and interpretability issues. Addressing this gap, we introduce a new latent feature model, leveraging the prominent RESCAL framework to account for degree heterogeneity, multiscale structure, and scalable inference using an approximation of the full likelihood of all triplets circumventing negative sampling inference strategies. This not only enhances computational efficiency but also provides deeper insights into the intrinsic multiscale structure of knowledge graphs, thereby advancing the interpretability of predictive models and paving the way for a more comprehensive understanding of complex information networks.

URL: https://openreview.net/forum?id=a2CrD4rkPx

---

Title: Compromising Honesty and Harmlessness in Language Models via Covert Deception Attacks

Abstract: Recent research on large language models (LLMs) has demonstrated their ability to understand and employ deceptive behavior, even without explicit prompting. Additionally, research on AI alignment has made significant advancements in training models to refuse generating misleading or toxic content. As a result, LLMs generally became honest and harmless. In this study, we introduce “deception attacks” that undermine both of these traits while keeping models seemingly trustworthy, revealing a vulnerability that, if exploited, could have serious real-world consequences. We introduce fine-tuning methods that cause models to selectively deceive users on targeted topics while remaining accurate on others, to maintain a high user trust. Through a series of experiments, we show that such targeted deception is effective even in high-stakes domains or ideologically charged subjects. In addition, we find that deceptive fine-tuning often compromises other safety properties: deceptive models are more likely to produce toxic content, including hate speech and stereotypes. Finally, since self-consistent deception across turns gives users few cues to detect manipulation and thus can preserve trust, we test for multi-turn deception and observe mixed results. Given that millions of users interact with LLM-based chatbots, voice assistants, agents, and other interfaces where trustworthiness cannot be ensured, securing these models against covert deception attacks is critical.

URL: https://openreview.net/forum?id=2KPIDIeLE2

---

Reply all
Reply to author
Forward
0 new messages