Daily TMLR digest for Jul 25, 2024

0 views

Skip to first unread message

TMLR

unread,

Jul 25, 2024, 12:00:07 AM (2 days ago) Jul 25

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

Authors: Dylan Zhang, Curt Tigges, Zory Zhang, Stella Biderman, Maxim Raginsky, Talia Ringer

Abstract: This paper investigates the ability of transformer-based models to learn structural recursion from examples. Recursion is a universal concept in both natural and formal languages. Structural recursion is central to the programming language and formal mathematics tasks where symbolic tools currently excel beyond neural models, such as inferring semantic relations between datatypes and emulating program behavior.
We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to concrete sequence modeling problems and learned models' behavior. The framework includes a representation that captures the general \textit{syntax} of structural recursion, coupled with two different frameworks for understanding their \textit{semantics}---one that is more natural from a programming languages perspective and one that helps bridge that perspective
with a mechanistic understanding of the underlying transformer architecture.

With our framework as a powerful conceptual tool, we identify different issues under various set-ups. The models trained to emulate recursive computations cannot fully capture the recursion yet instead fit short-cut algorithms and thus cannot solve certain edge cases that are under-represented in the training distribution. In addition, it is difficult for state-of-the-art large language models (LLMs) to mine recursive rules from in-context demonstrations. Meanwhile, these LLMs fail in interesting ways when emulating reduction (step-wise computation) of the recursive function.

URL: https://openreview.net/forum?id=Ry5CXXm1sf

---

Title: Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

Authors: Matthew Lau, ISMAILA SECK, Athanasios P Meliopoulos, Wenke Lee, Eugene Ndiaye

Abstract: The inability to linearly classify $\texttt{XOR}$ has motivated much of deep learning. We revisit this age-old problem and show that $\textit{linear}$ classification of $\texttt{XOR}$ is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, $\texttt{equality separation}$, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce $\textit{closing numbers}$, a quantitative measure on the capacity for classifiers to form closed decision regions for anomaly detection. Springboarding from this theoretical connection between binary classification and anomaly detection, we test our hypothesis on supervised anomaly detection experiments, showing that equality separation can detect both seen and unseen anomalies.

URL: https://openreview.net/forum?id=zOJ846BXhl

---

New submissions
===============

Title: Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

Abstract: The model editing problem concerns how language models should learn new facts about the world over time. While empirical research on model editing has drawn widespread attention, the conceptual foundations of model editing remain shaky -- perhaps unsurprisingly, since model editing is essentially belief revision, a storied problem in philosophy that has eluded succinct solutions for decades. Model editing nonetheless demands a solution, since we need to be able to control knowledge within language models. With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research. We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place. Many of the challenges are extremely difficult to address, e.g. determining far-reaching consequences of edits, labeling probabilistic entailments between facts, and updating beliefs of agent simulators. Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent. This enables us to say exactly how belief revision in language models falls short of a desirable epistemic standard. We encourage further research exploring settings where such a gold standard can be compared against.

URL: https://openreview.net/forum?id=LRf19n5Ly3

---

Title: Tweedie Moment Projected Diffusions for Inverse Problems

Abstract: Diffusion generative models unlock new possibilities for inverse problems as they allow for the incorporation of strong empirical priors into the process of scientific inference. Recently, diffusion models are repurposed for solving inverse problems using Gaussian approximations to conditional densities of the reverse process via Tweedie’s formula to parameterise the mean, complemented with various heuristics. To address various challenges arising from these approximations, we leverage higher order information using Tweedie’s formula and obtain a statistically principled approximation. We further provide a theoretical guarantee specifically for posterior sampling which can lead to better theoretical understanding of diffusion-based conditional sampling. Finally, we illustrate the empirical effectiveness of our approach for general linear inverse problems on toy synthetic examples as well as image restoration. We show that our method (i) removes any time-dependent step-size hyperparameters required by earlier methods, (ii) brings stability and better sample quality across multiple noise levels, (iii) is the only method that works in a stable way with variance exploding (VE) forward processes as opposed to earlier works.

URL: https://openreview.net/forum?id=4unJi0qrTE

---

Title: Data Augmentation Policy Search for Long-Term Forecasting

Abstract: Data augmentation serves as a popular regularization technique to combat overfitting
challenges in neural networks. While automatic augmentation has demonstrated success in
image classification tasks, its application to time-series problems, particularly in long-term
forecasting, has received comparatively less attention. To address this gap, we introduce a
time-series automatic augmentation approach named TSAA, which is both efficient and easy
to implement. The solution involves tackling the associated bilevel optimization problem
through a two-step process: initially training a non-augmented model for a limited number
of epochs, followed by an iterative split procedure. During this iterative process, we alternate
between identifying a robust augmentation policy through Bayesian optimization and refining
the model while discarding suboptimal runs. Extensive evaluations on challenging univariate
and multivariate forecasting benchmark problems demonstrate that TSAA consistently
outperforms several robust baselines, suggesting its potential integration into prediction
pipelines.

URL: https://openreview.net/forum?id=Wnd0XY0twh

---

Reply all

Reply to author

Forward

0 new messages