Session 15: Planner Aware Path Learning in Diffusion Language Models Training

Diffusion LLM

unread,

Apr 17, 2026, 1:26:49 PMApr 17

to diffus...@googlegroups.com

Hello folks,

A key limitation of diffusion language models is that they are usually trained without accounting for the planner that guides decoding at inference time.

Planner Aware Path Learning (PAPL) brings the planning process into the training objective so that learning better matches how generation is actually performed.

By viewing decoding as a coupled planner–denoiser process, PAPL provides a more principled training framework for diffusion language models. Across experiments, this leads to improved sequence generation quality over simple training objectives (2 lines of code change) .

This Monday, Fred Zhangzhi Peng and Zachary Bezemek will present their jointly led paper, which received Oral at ICLR 2026.

Title: Planner Aware Path Learning in Diffusion Language Models Training

Meeting Link: click here

Time: April 6 (Monday) 1pm ET / 10am PT / 7pm CET / 10:30pm IST

Paper: [2509.23405] Planner Aware Path Learning in Diffusion Language Models Training

Prior knowledge:

Fundamentals of discrete diffusion (video by Sasha Rush)

Abstract: Diffusion language models have emerged as a powerful alternative to autoregressive models, enabling fast inference through more flexible and parallel generation paths. This flexibility of sampling is unlocked by new engineered sampling strategies, or planners, that select more favorable generation paths by iteratively planning - versus uniformly at random - where to denoise along the sequence. However, by modifying the reverse paths via planning, planners create an irrevocable mismatch between the uniformly random denoising paths assumed during training and planning-based inference.

In this paper, we systematically investigate the mismatch of discrete diffusion training and inference under planning and theoretically prove that the standard discrete diffusion training evidence lower bound (ELBO) does not accurately describe a denoiser that uses a non-uniform planner. To address this gap, we derive a new planned evidence lower bound (P-ELBO) that incorporates planner-based reverse dynamics directly into the training objective. Using the P-ELBO, we introduce Planner Aware Path Learning (PAPL), a novel training scheme that aligns training and inference under a planned denoiser.

PAPL is implemented as a simple yet effective modification to the standard masked discrete diffusion loss, making it widely applicable and easy to adopt. Empirically, we show PAPL delivers consistent gains across domains, including a 40% relative improvement in protein sequences, improved text generation with up to a 4x relative MAUVE gain, and 23% relative improvement in code generation HumanEval pass@10. Code is available at http://github.com/pengzhangzhi/PAPL.

Yours truly,

Subham, Justin, Zhihan

Website, Twitter, Discord, YouTube

Diffusion LLM

unread,

Apr 20, 2026, 12:26:13 PMApr 20

to Diffusion-llms

Reminder: we are meeting in 35 minutes!

Meeting link: https://teams.live.com/meet/935657993819?p=XomgLoX5SceKIViDSF

Diffusion LLM

unread,

Apr 20, 2026, 3:01:00 PMApr 20

to Diffusion-llms

Missed today's session? We just released the recording here: https://youtu.be/MBgNIZFnKjQ if you're interested.

Reply all

Reply to author

Forward