Session 13: The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum

Diffusion LLM

unread,

Mar 20, 2026, 1:12:39 PMMar 20

to diffus...@googlegroups.com

Hello folks,

The Diffusion Duality (ICML 2025) showed that uniform-state discrete diffusion arises from Gaussian diffusion. Its new Chapter II (ICLR 2026) introduces Ψ-samplers, non-Markovian predictor-corrector samplers for arbitrary noise priors.

Unlike ancestral sampling which plateaus, Ψ-samplers exhibit improved test-time scaling, beating MDLM on language generation (OpenWebText) and image generation (CIFAR-10).

The authors also reformulate the Gaussian curriculum from Duo, reducing the training time by 25% while matching perplexity and downstream accuracy.

This Monday, Justin Deschenaux will present his paper, published with collaborators Caglar Gulcehre and Subham Sahoo.

Title: The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum

Meeting Link: click here

Time: Mar 23 (Monday) 1pm ET / 10am PT / 6pm CET / 10:30pm IST

Paper: [2602.21185] The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum

Prior knowledge:

Fundamentals of discrete diffusion (video by Sasha Rush)

The Diffusion Duality (video by our reading group)

Abstract: Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2.

Yours truly,

Subham, Justin, Zhihan

Website, Twitter, Discord, YouTube

Diffusion LLM

unread,

Mar 23, 2026, 11:59:56 AMMar 23

to diffus...@googlegroups.com

This is happening in 1 hour!

Gentle reminder: See you all at 1pm ET / 10am PT / 6pm CET / 10:30pm IST

Meeting Link: https://teams.live.com/meet/935657993819?p=XomgLoX5SceKIViDSF

Today's paper: [2602.21185] The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum

Diffusion LLM

unread,

Mar 23, 2026, 5:24:35 PMMar 23

to Diffusion-llms

The video of today's talk is available on YouTube. If you missed it, make sure to check it out: https://www.youtube.com/watch?v=rxhvBP1NZ-w

Reply all

Reply to author

Forward