Session 14: One-step Language Modeling via Continuous Denoising

18 views
Skip to first unread message

Diffusion LLM

unread,
Apr 4, 2026, 2:45:57 PMApr 4
to diffus...@googlegroups.com

Hello folks,

Language models based on discrete diffusion have shown promise for parallel generation, but they suffer from factorization error that causes sharp quality degradation in the few-step regime. 

To overcome this, Flow-based Language Models (FLMs) move from factorized ancestral sampling to sample-level continuous transport via flow matching. FLMs are high-performing through principled design choices such as a decoding-error-based time reparameterization. 

To enable few-step generation, the paper introduces the two-time denoiser, a novel reparameterization of the flow map that provably lies on the probability simplex, allowing the authors to distill FLM into a flow map language model (FMLM) via cross-entropy. FMLM transports noise to data in as few as one step, outperforming recent few-step discrete diffusion models and matching their 8-step quality at one step with an approximately 8.3× speedup.

This Monday, Chanhyuk Lee (https://david3684.github.io/), Nicholas M. Boffi (https://nmboffi.github.io/), and Jinwoo Kim (https://jw9730.github.io/) will present their paper. 

Title: One-step Language Modeling via Continuous Denoising


Meeting Link: click here

Time: April 6 (Monday) 1pm ET / 10am PT / 6pm CET / 10:30pm IST

Paper: [2602.16813] One-step Language Modeling via Continuous Denoising 


Prior knowledge: 

Fundamentals of discrete diffusion (video by Sasha Rush)

The Diffusion Duality (video by our reading group)

Single-step Generative Models (video by Jia-Bin Huang)


Abstract: Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. In practice, however, they exhibit a sharp degradation of sample quality in the few-step regime, failing to realize this promise. Here we show that language models leveraging flow-based continuous denoising can outperform discrete diffusion in both quality and speed. By revisiting the fundamentals of flows over discrete modalities, we build a flow-based language model (FLM) that performs Euclidean denoising over one-hot token encodings. We show that the model can be trained by predicting the clean data via a cross entropy objective, where we introduce a simple time reparameterization that greatly improves training stability and generation quality. By distilling FLM into its associated flow map, we obtain a distilled flow map language model (FMLM) capable of few-step generation. On the LM1B and OWT language datasets, FLM attains generation quality matching state-of-the-art discrete diffusion models. With FMLM, our approach outperforms recent few-step language models across the board, with one-step generation exceeding their 8-step quality. Our work calls into question the widely held hypothesis that discrete diffusion processes are necessary for generative modeling over discrete modalities, and paves the way toward accelerated flow-based language modeling at scale. 


Yours truly,

Subham, Justin, Zhihan

Website, Twitter, Discord, YouTube

Diffusion LLM

unread,
Apr 7, 2026, 11:29:25 PMApr 7
to diffus...@googlegroups.com
The recording can be found here: https://youtu.be/ZSROTE4evtE
Reply all
Reply to author
Forward
0 new messages