Daily TMLR digest for Nov 14, 2025

0 views

Skip to first unread message

TMLR

unread,

Nov 14, 2025, 12:30:10 AMNov 14

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: ComFe: An Interpretable Head for Vision Transformers

Authors: Evelyn Mannix, Liam Hodgkinson, Howard Bondell

Abstract: Interpretable computer vision models explain their classifications through comparing the distances between the local embeddings of an image and a set of prototypes that represent the training data. However, these approaches introduce additional hyper-parameters that need to be tuned to apply to new datasets, scale poorly, and are more computationally intensive to train in comparison to black-box approaches. In this work, we introduce Component Features (ComFe), a highly scalable interpretable-by-design image classification head for pretrained Vision Transformers (ViTs) that can obtain competitive performance in comparison to comparable non-interpretable methods. To our knowledge, ComFe is the first interpretable head and unlike other interpretable approaches can be readily applied to large-scale datasets such as ImageNet-1K. Additionally, ComFe provides improved robustness and outperforms previous interpretable approaches on key benchmark datasets while using a consistent set of hyperparameters and without finetuning the pretrained ViT backbone. With only global image labels and no segmentation or part annotations, ComFe can identify consistent component features within an image and determine which of these features are informative in making a prediction. Code is available at github.com/emannix/comfe-component-features.

URL: https://openreview.net/forum?id=cI4wrDYFqE

---

Title: Preserving Angles Improves Feature Distillation

Authors: Evelyn Mannix, Liam Hodgkinson, Howard Bondell

Abstract: Knowledge distillation methods compress models by training a student network using the classification outputs of a high quality teacher model, but can fail to effectively transfer the properties of computer vision foundation models from the teacher to the student. While it has been recently shown that feature distillation—where a teacher model's output features are replicated instead—can reproduce performance for foundation models across numerous downstream tasks, they fall short in matching critical properties such as robustness and out-of-distribution (OOD) detection performance. This paper overcomes this shortcoming by introducing Cosine-similarity Preserving Compression (CosPress), a feature distillation technique that learns a mapping to compress the latent space of the teacher model into the smaller latent space of the student, by preserving the cosine similarities between image embeddings. This enables direct optimisation of the student network and produces a more faithful reproduction of the teacher's properties. It is shown that distillation with CosPress on a variety of datasets, including ImageNet, produces more accurate models with greater performance on generalisability, robustness and OOD detection benchmarks, and that this technique provides a competitive pathway for training highly performant lightweight models on small datasets. Code is available at github.com/emannix/cospress.

URL: https://openreview.net/forum?id=ZEhgODZkWU

---

Title: MemeSense: An Adaptive In-Context Framework for Social Commonsense Driven Meme Moderation

Authors: Sayantan Adak, Somnath Banerjee, Rajarshi Mandal, Avik Halder, Sayan Layek, Rima Hazra, Animesh Mukherjee

Abstract: Online memes are a powerful yet challenging medium for content moderation, often masking harmful intent behind humor, irony, or cultural symbolism. Conventional moderation systems “especially those relying on explicit text” frequently fail to recognize such subtle or implicit harm. We introduce MemeSense, an adaptive framework designed to generate socially grounded interventions for harmful memes by combining visual and textual understanding with curated, semantically aligned examples enriched with commonsense cues. This enables the model to detect nuanced complexed threats like misogyny, stereotyping, or vulgarity “even in memes lacking overt language”. Across multiple benchmark datasets, MemeSense outperforms state-of-the-art methods, achieving up to 35% higher semantic similarity
and 9% improvement in BERTScore for non-textual memes, and notable gains for text-rich memes as well. These results highlight MemeSense as a promising step toward safer, more context-aware AI systems for real-world content moderation. The code and data are available at: https://github.com/sayantan11995/MemeSense

URL: https://openreview.net/forum?id=ahRqI3NBiq

---

Title: Gradient GA: Gradient Genetic Algorithm For Drug Molecular Design

Authors: Debadyuti Mukherjee, Chris Zhuang, Yingzhou Lu, Tianfan Fu, Ruqi Zhang

Abstract: Molecular discovery has brought great benefit to the chemical industry. Various molecu-
lar design techniques have been developed to identify molecules with desirable properties.
Traditional optimization methods, such as genetic algorithms, continue to achieve state-of-
the-art results across various molecular design benchmarks. However, these techniques rely
solely on undirected random exploration, which hinders both the quality of the final solution
and the convergence speed. To address this limitation, we propose a novel approach called
Gradient Genetic Algorithm (Gradient GA), which incorporates gradient information from
the objective function into genetic algorithms. Instead of random exploration, each proposed
sample iteratively progresses toward an optimal solution by following the gradient direction.
We achieve this by designing a differentiable objective function parameterized by a neural
network and utilizing the Discrete Langevin Proposal to enable gradient guidance in discrete
molecular spaces. Experimental results demonstrate that our method significantly improves
both convergence speed and solution quality, outperforming cutting-edge techniques. The
proposed method has shown up to a 25% improvement in the Top 10 score over the vanilla ge-
netic algorithm. The code is available at https://github.com/debadyuti23/GradientGA.

URL: https://openreview.net/forum?id=kFKcktAeEG

---

Title: Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions

Authors: Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan

Abstract: We construct four Schauder bases for the space $C[0,1]$, one using ReLU functions, another using Softplus functions, and two more using sigmoidal versions of the ReLU and Softplus functions. This establishes the existence of a basis using these functions for the first time, and improves on the universal approximation property associated with them. We also show an $O(\frac{1}{n})$ approximation bound based on our ReLU basis, and a negative result on constructing multivariate functions using finite combinations of ReLU functions.

URL: https://openreview.net/forum?id=YT79Qu1bOi

---

Title: UMP-Net: Uncertainty-Aware Mixture of Prompts Network for Efficient Instruction Tuning

Authors: Fatemeh Daneshfar, Abdulhady abas, Moloud Abdar, Pietro Lio

Abstract: Instruction tuning has greatly improved how large language models (LLMs) respond to human-like instructions. However, fully fine-tuning these models is still computationally demanding, and many existing parameter-efficient methods fall short, particularly when it comes to uncertainty estimation and working effectively across different modalities. To address this, we introduce UMP-Net (Uncertainty-Aware Mixture of Prompts Network), a new approach designed to enhance the ability of LLaMA to follow instructions.
UMP-Net combines a novel mixture of prompts (MoPs) technique with Latent Noise Prompting, KNN-based Heterogeneous Clustering, and Conformal Predictions to select the most reliable prompts dynamically while accounting for uncertainty. In addition, it features a CLIP-based multi-modal architecture to streamline vision-language integration. We evaluated UMP-Net on a range of benchmarks including ScienceQA, COCO Caption, and various zero-shot multi-modal tasks. The results show a strong performance: an average accuracy of 88.41% on ScienceQA and a CIDEr score of 158.3 on COCO Caption, surpassing models such as LLaVA, LLaMA-Adapter, and LLaMA-Excitor. These findings suggest that UMP-Net offers both improved multi-modal capability and computational efficiency. Further ablations demonstrate UMP-Net’s conformal prediction module provides robust uncertainty estimates under noise and domain shifts, outperforming Bayesian alternatives in coverage guarantees with minimal overhead.

URL: https://openreview.net/forum?id=EehtvgNXAl

---

Title: Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning

Authors: Zheng Zhang

Abstract: Large Language Models (LLMs) display striking surface fluency yet systematically fail at tasks requiring symbolic reasoning, arithmetic accuracy, and logical consistency. This paper offers a structural diagnosis of such failures, revealing a persistent gap between \textit{comprehension} and \textit{competence}. Through controlled experiments and architectural analysis, we demonstrate that LLMs often articulate correct principles without reliably applying them—a failure rooted not in knowledge access, but in computational execution. We term this phenomenon the computational \textit{split-brain syndrome}, where instruction and action pathways are geometrically and functionally dissociated. This core limitation recurs across domains, from mathematical operations to relational inferences, and explains why model behavior remains brittle even under idealized prompting. We argue that LLMs function as powerful pattern completion engines, but lack the architectural scaffolding for principled, compositional reasoning. Our findings delineate the boundary of current LLM capabilities and motivate future models with metacognitive control, principle lifting, and structurally grounded execution. This diagnosis also clarifies why mechanistic interpretability findings may reflect training-specific pattern coordination rather than universal computational principles, and why the geometric separation between instruction and execution pathways suggests limitations in neural introspection and mechanistic analysis.

URL: https://openreview.net/forum?id=Gz5HMiJLqv

---

Title: Pseudo-Physics-Informed Neural Operators: Enhancing Operator Learning from Limited Data

Authors: Keyan Chen, Yile Li, Da Long, Zhitong Xu, WEI W. XING, Jacob Hochhalter, Shandian Zhe

Abstract: Neural operators have shown great potential in surrogate modeling. However, training a well-performing neural operator typically requires a substantial amount of data, which can pose a major challenge in complex applications. In such scenarios, detailed physical knowledge can be unavailable or difficult to obtain, and collecting extensive data is often prohibitively expensive. To mitigate this challenge, we propose the Pseudo Physics-Informed Neural Operator (PPI-NO) framework. PPI-NO constructs a surrogate physics system for the target system using partial differential equations (PDEs) derived from simple, rudimentary physics principles, such as basic differential operators.
This surrogate system is coupled with a neural operator model, using an alternating update and learning process to iteratively enhance the model's predictive power.
While the physics derived via PPI-NO may not mirror the ground-truth underlying physical laws --- hence the term ``pseudo physics'' --- this approach significantly improves the accuracy of standard operator learning models in data-scarce scenarios, which is evidenced by extensive evaluations across five benchmark tasks and a fatigue modeling application.

URL: https://openreview.net/forum?id=5N1V25Rf7D

---

New submissions
===============

Title: A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Abstract: Reinforcement learning (RL) methods such as Group Relative Policy Optimization (GRPO) have recently emerged as a leading approach for enhancing the reasoning ability of large language models (LLMs). Yet, the precise sources of their effectiveness remain unclear. In this work, we systematically decompose GRPO by benchmarking it against simpler REINFORCE-style baselines to identify its core components. Our analysis reveals a clear hierarchy: (i) iterative, online data collection is the dominant driver of performance, enabling even simple positive-only fine-tuning (e.g., RAFT) to be surprisingly strong; (ii) negative signals primarily sustain exploration by preventing rapid entropy collapse; and (iii) GRPO’s main benefit stems not from reward normalization itself, but from the implicit data filtering effect it induces by discarding prompts with uniform rewards (all-correct or all-incorrect). Guided by this insight, we propose REINFORCE-Rej, a minimal variant that makes filtering explicit. REINFORCE-Rej matches GRPO’s performance while being simpler and more KL-efficient. These findings suggest that principled data filtering, rather than algorithmic complexity, is the key to robust RL for LLMs.

URL: https://openreview.net/forum?id=eK3yDPtwIK

---

Title: Genomic Next-Token Predictors are In-Context Learners

Abstract: In-context learning (ICL) -- the capacity of a model to infer and apply abstract patterns from examples provided within its input -- has been extensively studied in large language models trained for next-token prediction on human text. In fact, prior work often attributes this emergent behavior to distinctive statistical properties in *human* language. This raises a fundamental question: can ICL arise *organically* in other sequence domains purely through large-scale predictive training?

To explore this, we turn to genomic sequences, an alternative symbolic domain rich in statistical structure. Specifically, we study the Evo2 genomic model, trained predominantly on next-nucleotide (A/T/C/G) prediction, at a scale comparable to mid-sized LLMs. We develop a controlled experimental framework comprising symbolic reasoning tasks instantiated in both linguistic and genomic forms, enabling direct comparison of ICL across genomic and linguistic models. Our results show that genomic models, like their linguistic counterparts, exhibit log-linear gains in pattern induction as the number of in-context demonstrations increases. To the best of our knowledge, this is the first evidence of organically emergent ICL in genomic sequences, supporting the hypothesis that ICL arises as a consequence of large-scale predictive modeling over rich data.

URL: https://openreview.net/forum?id=KmNFx8DmaZ

---

Title: AdaCubic: An Adaptive Cubic Regularization Optimizer for Deep Learning

Abstract: A novel regularization technique called AdaCubic is proposed that adapts the weight of
the cubic term. The heart of AdaCubic is an auxiliary optimization problem with cubic
constraints that dynamically adjusts the weight of the cubic term in Newton’s cubic regular-
ized method. We utilize Hutchinson’s method to approximate the Hessian matrix, thereby
reducing computation costs. We demonstrate that AdaCubic inherits the cubically regular-
ized Newton method’s local convergence guarantees. Our experiments in Computer Vision,
Natural Language Processing, and Signal Processing tasks demonstrate that AdaCubic out-
performs or competes with several widely used optimizers. Unlike other adaptive algorithms
that require fine-tuning of hyperparameters, AdaCubic is evaluated with a pre-fixed set of
hyperparameters, making it a highly attractive optimizer in situations where fine-tuning is
not feasible. This makes AdaCubic an attractive option for researchers and practitioners
alike. To our knowledge, AdaCubic is the first optimizer to leverage the power of cubic
regularization for large-scale applications. The code of AdaCubic will be publicly released
upon paper acceptance.

URL: https://openreview.net/forum?id=pZBQ7J37lk

---

Title: Semi-Supervised Cross-Domain Imitation Learning

Abstract: Cross-domain imitation learning (CDIL) accelerates policy learning by transferring expert knowledge across domains, which is valuable in applications where collection of expert data is costly. Existing methods are either supervised, relying on proxy tasks and explicit alignment, or unsupervised, aligning distributions without paired data but often unstable. We introduce the Semi-Supervised CDIL (SS-CDIL) setting and propose the first algorithm for SS-CDIL with theoretical justification. Our method uses only offline data, including a small number of target expert demonstrations and some unlabeled imperfect trajectories. To handle domain discrepancy, we propose a novel cross-domain loss function for learning inter-domain state-action mappings and design an adaptive weight function to balance the source and target knowledge. Experiments on MuJoCo and Robosuite show consistent gains over the baselines, demonstrating that our approach achieves stable and data-efficient policy learning with minimal supervision.

URL: https://openreview.net/forum?id=WARXnbJawZ

---

Title: Dynamics‑Aligned Diffusion Planning for Offline RL: A Unified Framework with Forward and Inverse Guidance

Abstract: Diffusion-based planning has emerged as a powerful paradigm for offline reinforcement learning (RL). However, existing approaches often overlook the physical constraints imposed by real-world dynamics, resulting in dynamics inconsistency—a mismatch between diffusion-generated trajectories and those feasible under true environment transitions. To address this issue, we propose Dynamics-Aligned Diffusion Planning (DADP), a unified framework that explicitly enforces dynamics consistency during the diffusion denoising process. DADP offers two complementary variants: DADP-F (Forward), which employs a forward dynamics model to ensure state-level feasibility, and DADP-I (Inverse), which leverages an inverse dynamics model to enhance action-level executability. Both variants share a unified guidance formulation that integrates task return optimization and dynamics alignment through gradient-based updates. Experiments on D4RL Maze2D and MuJoCo benchmarks demonstrate that DADP-F and DADP-I outperform state-of-the-art offline RL baselines, effectively reducing dynamics inconsistency and improving long-horizon robustness. This unifies diffusion-based planning with physically grounded dynamics modeling.

URL: https://openreview.net/forum?id=h3hG6EuqU2

---

Title: AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Abstract: As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses.
Crucially, the two foundational assets of ML—\textbf{data} and \textbf{models}—are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline.
To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model–data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models.
The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data$\rightarrow$Data (D$\rightarrow$D): including \emph{data decryption attacks, watermark removal attacks, and jailbreak attacks}.
(2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning and harmful fine-tuning attacks};
(3) Model$\rightarrow$Data (M$\rightarrow$D): including \emph{model inversion, membership inference attacks, and training data extraction attacks};
(4) Model$\rightarrow$Model (M$\rightarrow$M): including \emph{model extraction attacks}.
We conduct a systematic review that analyzes the mathematical formulations, attack and defense strategies, and applications across the vision, language, audio, and graph domains.
Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies—particularly within the landscape of foundation models.

URL: https://openreview.net/forum?id=1g7pKgClZs

---

Title: Mechanism-Aware Prediction of Tissue-Specific Drug Activity via Multi-Modal Biological Graphs

Abstract: Predicting how small molecules behave across human tissues is essential for targeted therapy development. While some existing models incorporate tissue identity, they treat it as a label—ignoring the underlying biological mechanisms that differentiate tissues. We present Expresso, a multi-modal architecture that predicts tissue-specific molecular activity by modeling how compounds interact with transcriptomic and pathway-level tissue context. Expresso constructs heterogeneous graphs from GTEx data, linking samples, genes, and pathways to reflect expression profiles and curated biological relationships. These graphs are encoded using a hierarchical GNN and fused with frozen molecular embeddings to produce context-aware predictions. A multi-task pretraining strategy—spanning gene recovery, tissue classification, and pathway-level contrastive learning—guides the model to learn mechanistically grounded representations. On nine tissues, Expresso improves mean AUC by up to 27.9 points over molecule-only baselines. Our results demonstrate that incorporating biological structure—not just tissue labels, yields more accurate and interpretable models for tissue-specific drug behavior.

URL: https://openreview.net/forum?id=UDW8m9iQeC

---

Title: Wikipedia in the Era of LLMs: Evolution and Risks

Abstract: In this paper, we present a comprehensive analysis and monitoring framework for the impact of Large Language Models (LLMs) on Wikipedia, examining the evolution of Wikipedia through existing data and using simulations to explore potential risks. We begin by analyzing article content and page views to study the recent changes in Wikipedia and assess the impact of LLMs. Subsequently, we evaluate how LLMs affect various Natural Language Processing (NLP) tasks related to Wikipedia, including machine translation and retrieval-augmented generation (RAG). Our findings and simulation results reveal that Wikipedia articles have been affected by LLMs, with an impact of approximately 1%-2% in certain categories. If the machine translation benchmark based on Wikipedia is influenced by LLMs, the scores of the models may become inflated, and the comparative results among models could shift. Moreover, the effectiveness of RAG might decrease if the knowledge has been contaminated by LLMs. While LLMs have not yet fully changed Wikipedia's language and knowledge structures, we believe that our empirical findings signal the need for careful consideration of potential future risks in NLP research.

URL: https://openreview.net/forum?id=ahVmnYkVLt

---

Title: GenAI vs. Human Creators: Procurement Mechanism Design in Two-/Three-Layer Markets

Abstract: With the rapid advancement of generative AI (GenAI), mechanism design adapted to its unique characteristics poses new theoretical and practical challenges. Unlike traditional goods, content from one domain can enhance the training and performance of GenAI models in other domains. For example, OpenAI’s video generation model Sora (Liu et al., 2024b) relies heavily on image data to improve video generation quality. In this work, we study nonlinear procurement mechanism design under data transferability, where online platforms employ both human creators and GenAI to satisfy cross-domain content demand. We propose optimal mechanisms that maximize either platform revenue or social welfare and identify the specific properties of GenAI that make such high-dimensional design problems tractable. Our analysis further reveals which domains face stronger competitive pressure and which tend to experience overproduction. Moreover, the growing role of data intermediaries, including labeling companies such as Scale AI and creator organizations such as The Wall Street Journal, introduces a third layer into the traditional platform–creator structure. We show that this three-layer market can result in a lose-lose outcome, reducing both platform revenue and social welfare, as large pre-signed contracts distort creators’ incentives and lead to inefficiencies in the data market. These findings suggest a need for government regulation of the GenAI data ecosystem, and our theoretical insights are further supported by numerical simulations.

URL: https://openreview.net/forum?id=Eukf4TBHS7

---

Title: AI Influence: Mechanisms, Amplifiers, and Consequences

Abstract: AI influence refers to AI's impact on the knowledge and values of individuals by acting as producers, mediators, and receivers of information. As a result, it impacts our collective processes of creating and spreading knowledge, forming beliefs, and reaching consensus. We argue that there are mechanisms of inconspicuous influence in AI development and deployment pipelines, which, when amplified by societal dynamics, could lead to dangerous outcomes that we may reverse by early interventions. We detail those mechanisms, amplifiers, and potential long-term consequences.

URL: https://openreview.net/forum?id=7MKCuXjJMW

---

Title: Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

Abstract: Large Language Models (LLMs) often hallucinate, limiting their reliability in sensitive applications. In black-box settings, several self-consistency-based techniques have been proposed for hallucination detection. We empirically show that these methods perform nearly as well as a supervised (black-box) oracle, leaving limited room for further gains within this paradigm. To address this limitation, we explore cross-model consistency checking between the target model and an additional verifier LLM. With this extra information, we observe improved oracle performance compared to purely self-consistency-based methods. We then propose a budget-friendly, two-stage detection algorithm that calls the verifier model only for a subset of cases. It dynamically switches between self-consistency and cross-consistency based on an uncertainty interval of the self-consistency classifier. We provide a geometric interpretation of consistency-based hallucination detection methods through the lens of kernel mean embeddings, offering deeper theoretical insights. Extensive experiments show that this approach maintains high detection performance while significantly reducing computational cost.

URL: https://openreview.net/forum?id=6tlLISSgiu

---

Title: Astrocyte-Inspired Hierarchical Routing for Enhanced Expert Specialization in Mixture-of-Experts Models

Abstract: The Mixture-of-Experts (MoE) architecture is a leading paradigm for scaling, but cultivating genuine expert specialization is a persistent challenge, often hindered by load balancing. This paper introduces Astrocyte-Hierarchical Routing (AHR), a novel, bio-inspired mechanism that addresses this challenge. Drawing inspiration from astrocytes, AHR conditions local, token-level routing decisions on a global context signal. In our encoder-based implementation, this signal, derived from the [CLS] token, additively biases local routing decisions, promoting a developmental trajectory for expert functionality. We conduct experiments on a multi-class text classification task, comparing AHR against strong baselines. The results demonstrate that AHR achieves a statistically significant and substantial increase in final-layer expert specialization without incurring a discernible loss in task performance. Qualitative analysis further confirms that AHR fosters a transition from generalist experts in early layers to highly specialized experts in later layers. This work presents a new principle for MoE router design: a contextual, two-level approach. This successful validation in an encoder model serves as a proof-of-concept, opening the way for future work on scaling AHR and adapting its principle to other architectures.

URL: https://openreview.net/forum?id=4pHo47SXaA

---

Title: Watermarking Degrades Alignment in Language Models: Analysis and Mitigation

Abstract: Watermarking is emerging as a practical mechanism for provenance in language models, but it modifies token probabilities at inference time, the very same locus targeted by alignment training. This overlap raises a basic question relevant for deployment: how do watermark-induced shifts interact with the procedures intended to make models safe and useful? We conduct a systematic study across several contemporary models and two representative watermarking schemes. We find that watermarking induces a nontrivial, patterned yet model-specific shift in alignment. Two regimes recur: guard attenuation, where models become more helpful but less safe, and guard amplification, where refusals become overly conservative. Crucially, these effects persist even after controlling for perplexity degradation, indicating alignment-specific distortions beyond generalized quality loss. To mitigate these effects, we introduce Alignment Resampling (AR), a procedure that samples multiple watermarked outputs and selects the most aligned response according to an external reward model. Drawing on established results for the expected maximum of Gaussian random variables, we derive a theoretical lower bound showing that alignment gains grow sublogarithmically with sample size, providing principled guidance on minimal sampling requirements. Interestingly, we observe that sampling as few as two to four candidates largely restores unwatermarked alignment performance in truthfulness, safety, and helpfulness, while leaving watermark detectability essentially unchanged. This study offers the first systematic audit of watermarking-alignment interactions, quantifies the trade-off between watermark strength and alignment, and proposes a simple, inference-time mitigation procedure suitable for deployment.

URL: https://openreview.net/forum?id=w2ATKQcfWx

---

Title: STEALTH: Secure Transformer for Encrypted Alignment of Latent Text Embeddings via Semantic Isomorphism Enforcement (SIE) Loss Function

Abstract: The pervasive use of large language models (LLMs) on sensitive data presents a critical privacy challenge, as traditional encryption renders data unusable for inference. We introduce STEALTH, a 120M secure transformer framework designed to process encrypted text while preserving its semantic utility under an authorized-key threat model (no decryption or side-channel access). The core innovation of STEALTH is the Semantic Isomorphism Enforcement (SIE) loss function, a loss that trains the model to learn a topology-preserving mapping between encrypted text embeddings and their original plaintext latent space. This encourages preservation of semantic relationships and topological structure in the encrypted domain. Using retrieval-based reconstruction from a domain-aligned plaintext corpus, STEALTH achieves near-perfect semantic retrieval (BLEU score of 1.0 under full-corpus coverage in our experiments) and enables accurate privacy-preserving clustering on encrypted embeddings. We evaluate STEALTH across 44 datasets spanning general language understanding, healthcare, finance, legal, e-commerce, programming, content analysis, reading comprehension, and corporate communication domains with 16 encryption schemes (704 experimental conditions), establishing a comprehensive benchmark for
privacy-preserving NLP on encrypted text. Performance depends on domain alignment between encrypted inputs and the indexed plaintext corpus. Our results demonstrate that, with well-aligned domain indexes and retrieval support, models can perform effective NLP on encrypted data without direct decryption.

URL: https://openreview.net/forum?id=73PV17dVCM

---

Title: SSL-SLR: Self-Supervised Representation Learning for Sign Language Recognition

Abstract: Sign language recognition (SLR) is a machine learning task aiming to identify signs in videos. Due to the scarcity of annotated data, unsupervised methods like contrastive learning have become promising in this field. They learn meaningful representations by pulling positive pairs (two augmented versions of the same instance) closer and pushing negative pairs (different from the positive pairs) apart. In SLR, only certain parts of the sign videos provide information that is truly useful for their recognition. Applying contrastive methods to SLR raises two issues: (i) contrastive learning methods treat all parts of a video in the same way, without taking into account the relevance of certain parts over others; (ii) shared movements between different signs make negative pairs highly similar, complicating sign discrimination. These issues lead to learning non-discriminative features for sign recognition and poor results in downstream tasks. In response, this paper proposes a self-supervised learning framework designed to learn meaningful representations for SLR. This framework consists of two key components designed to work together: (i) a new self-supervised approach with free-negative pairs; (ii) a new data augmentation technique. This approach shows a considerable gain in accuracy compared to several contrastive and self-supervised methods, across linear evaluation, semi-supervised learning, and transferability between sign languages.

URL: https://openreview.net/forum?id=buTZkTXijy

---

Title: Probabilistic Shapley Value Modeling and Inference

Abstract: We propose probabilistic Shapley inference (PSI), a novel probabilistic framework to model and infer sufficient statistics of feature attributions in flexible predictive models, via latent random variables whose mean recovers Shapley values. PSI enables efficient, scalable inference over input-to-output attributions and their uncertainty, via a variational objective that jointly trains a predictive (regression or classification) model and its attribution distributions. To address the challenge of marginalizing over variable-length input feature subsets for Shapley value calculation, we introduce a masking-based neural network architecture, with a modular training and inference procedure. We evaluate PSI on synthetic and real-world datasets, showing that it achieves competitive predictive performance compared to strong baselines, while learning feature attribution distributions —centered at Shapley values— that reveal meaningful attribution uncertainty across data modalities.

URL: https://openreview.net/forum?id=Au7e02c4C7

---

Title: Token-Based Detection of Spurious Correlations in Vision Transformers

Abstract: Due to their powerful feature association capabilities, neural network-based computer vision models have the ability to detect and exploit unintended patterns within the data, potentially leading to correct predictions based on incorrect or unintended but statistically relevant signals. These clues may vary from simple color aberrations to small texts within the image. In situations where these unintended signals align with the predictive task, models can mistakenly link these features with the task and rely on them for making predictions. This phenomenon is referred to as spurious correlations, where patterns appear to be associated with the task but are actually coincidental. As a result, detection and mitigation of spurious correlations have become crucial tasks for building trustworthy, reliable, and generalizable machine learning models. In this work, we present a novel token-based method to detect spurious correlations in vision transformers, a type of neural network architecture that gained significant popularity in recent years. Using both supervised and self-supervised trained models, we present large-scale experiments on the ImageNet dataset demonstrating the ability of the proposed method to identify spurious correlations. We also find that, even if the same architecture is used, the training methodology has a significant impact on the model's reliance on spurious correlations. Furthermore, we show that certain classes in the ImageNet dataset contain spurious signals that are easily detected by the models and discuss the underlying reasons for those spurious signals. In light of our findings, we provide an exhaustive list of the aforementioned images and call for caution in their use in future research efforts. Lastly, we present a case study investigating spurious signals in invasive breast mass classification, grounding our work in a real-world scenario.

URL: https://openreview.net/forum?id=GlPXPhwOzI

---

Reply all

Reply to author

Forward

0 new messages