Daily TMLR digest for Feb 08, 2026

1 view

Skip to first unread message

TMLR

unread,

Feb 8, 2026, 12:30:08 AM (4 days ago) Feb 8

to tmlr-anno...@googlegroups.com

New submissions
===============

Title: Uncovering Language Model Processing Strategies with Non-Negative Per-Example Fisher Factorization

Abstract: Understanding the heuristics and algorithms that comprise a model's behavior is important for safe and reliable deployment.
While gradient clustering has been used for this purpose, gradients of a single log probability capture only a slice of the model's behavior, and clustering can only assign a single factor to each behavior.
We introduce NPEFF (Non-Negative Per-Example Fisher Factorization), an interpretability method that overcomes these limitations by decomposing per-example Fisher matrices using a novel decomposition algorithm that learns a set of components represented by learned rank-1 positive semi-definite matrices.
Through a combination of human evaluation and automated analysis, we demonstrate that these NPEFF components correspond to heuristics used by language models on a variety of text processing tasks.
We find that NPEFF excels at decomposing behaviors comprised of multiple factors compared to the baselines of gradient clustering and activation sparse autoencoders.
We also show how NPEFF can be adapted to be more efficient on tasks with few classes.
We further show how to construct parameter perturbations from NPEFF components to selectively disrupt a given component's role in the model's processing.
Along with ablation studies, we include experiments using NPEFF to study in-context learning.

URL: https://openreview.net/forum?id=UjeDVujI8q

---

Title: Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

Abstract: Despite their capabilities, Large Language Models (LLMs) remain opaque with limited understanding of their internal representations. Current interpretability methods either focus on input-oriented feature extraction, such as supervised probes and Sparse Autoencoders (SAEs), or on output distribution inspection, such as logit-oriented approaches. A full understanding of LLM vector spaces, however, requires integrating both perspectives, something existing approaches struggle with due to constraints on latent feature definitions. We introduce the Hyperdimensional Probe, a hybrid supervised probe that combines symbolic representations with neural probing. Leveraging Vector Symbolic Architectures (VSAs) and hypervector algebra, it unifies prior methods: the top-down interpretability of supervised probes, SAE’s sparsity-driven proxy space, and output-oriented logit investigation. This allows deeper input-focused feature extraction while supporting output-oriented investigation. Our experiments demonstrate that our method consistently extracts meaningful concepts across different LLMs, embedding sizes, and setups; uncovering concept-driven patterns in analogy-oriented inference and QA-focused text generation. By supporting joint input–output analysis, this work advances semantic understanding of neural representations while unifying the complementary perspectives of prior methods.

URL: https://openreview.net/forum?id=WxM7lIoGBb

---

Title: Tight Error Propagation Bounds for Multi-Step Chain-of-Thought Reasoning

Abstract: Chain-of-thought (CoT) reasoning enables large language models to solve complex problems, but understanding when these reasoning chains fail remains an open theoretical challenge. While recent work characterizes the computational expressivity of CoT, the fundamental question of reliability—how errors accumulate across steps—lacks rigorous foundations. We develop a Markov chain framework modeling CoT as a stochastic process on reasoning states, enabling formal analysis of error propagation. Our contributions establish: (1) tight bounds proving error probability grows as $1-(1-\varepsilon)^n$ for $n$ steps with per-step error $\varepsilon$; (2) verification overhead characterization showing $k$-redundant verification reduces error to $O(n^{k+1}\varepsilon^{k+1})$; (3) contractive self-correction analysis proving exponential convergence with mixing time $O(\log n/|\log q|)$ when $q < 1$; (4) information-theoretic impossibility results via Fano's inequality; and (5) concentration inequalities via martingale theory. We validate predictions through systematic experiments on synthetic tasks (bounds tight within 5%) and real LLM reasoning on PRM800K, GSM8K, and HumanEval datasets, demonstrating our framework accurately predicts failure rates across domains (mathematical reasoning, code generation). For practitioners: safe chain length is $n \lesssim \delta/\varepsilon$ without verification, while $k$-fold verification extends this to $n \lesssim (\delta/\varepsilon)^{1/(k+1)}$.

URL: https://openreview.net/forum?id=HfSgtFiZYf

---

Title: NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning

Abstract: This paper presents NoisyCoconut, a novel inference-time method that enhances large language model (LLM) reliability by manipulating internal representations. Unlike fine-tuning methods that require extensive retraining, NoisyCoconut operates directly on model representations during inference and requires no retraining. Rather than training models to reason in latent space, we inject controlled noise into latent trajectories to generate diverse reasoning paths. Agreement among these paths provides a confidence signal, enabling models to abstain when uncertain. We demonstrate that this approach achieves effective coverage-accuracy tradeoffs across multiple reasoning benchmarks without requiring access to training data or modification of model parameters. This approach provides a practical pathway to improving the reliability of LLM outputs while maintaining compatibility with existing models. Our experiments show that unanimous agreement among noise-perturbed paths reduces error rates from 40–70% to below 15%, enabling models to exceed 95% accuracy on mathematical reasoning tasks through selective abstention.

URL: https://openreview.net/forum?id=5aatZPiCv8

---

Title: Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning

Abstract: Supervised contrastive learning has achieved remarkable success by leveraging label information; however, determining positive samples in multi-label scenarios remains a critical challenge. In multi-label supervised contrastive learning (MSCL), multi-label relations are not yet fully defined, leading to ambiguity in identifying positive samples and formulating contrastive loss functions to construct the representation space. To address these challenges, we: (i) systematically formulate multi-label relations in MSCL, (ii) propose a novel \textit{Similarity-Dissimilarity Loss}, which dynamically re-weights samples based on similarity and dissimilarity factors, (iii) further provide theoretical grounded proofs for our method through rigorous mathematical analysis that supports the formulation and effectiveness, and (iv) offer a unified form and paradigm for both single-label and multi-label supervised contrastive loss. We conduct experiments on both image and text modalities and further extend the evaluation to the medical domain. The results show that our method consistently outperforms baselines in comprehensive evaluations, demonstrating its effectiveness and robustness.

URL: https://openreview.net/forum?id=W445zcqThv

---

Reply all

Reply to author

Forward

0 new messages