Hello folks,
Language models based on discrete diffusion have shown promise for parallel generation, but they suffer from factorization error that causes sharp quality degradation in the few-step regime.
To overcome this, Flow-based Language Models (FLMs) move from factorized ancestral sampling to sample-level continuous transport via flow matching. FLMs are high-performing through principled design choices such as a decoding-error-based time reparameterization.
To enable few-step generation, the paper introduces the two-time denoiser, a novel reparameterization of the flow map that provably lies on the probability simplex, allowing the authors to distill FLM into a flow map language model (FMLM) via cross-entropy. FMLM transports noise to data in as few as one step, outperforming recent few-step discrete diffusion models and matching their 8-step quality at one step with an approximately 8.3× speedup.
This Monday, Chanhyuk Lee (https://david3684.github.io/), Nicholas M. Boffi (https://nmboffi.github.io/), and Jinwoo Kim (https://jw9730.github.io/) will present their paper.
Title: One-step Language Modeling via Continuous Denoising
Meeting Link: click here
Time: April 6 (Monday) 1pm ET / 10am PT / 6pm CET / 10:30pm IST
Paper: [2602.16813] One-step Language Modeling via Continuous Denoising
Prior knowledge:
Fundamentals of discrete diffusion (video by Sasha Rush)
The Diffusion Duality (video by our reading group)
Single-step Generative Models (video by Jia-Bin Huang)
Abstract: Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. In practice, however, they exhibit a sharp degradation of sample quality in the few-step regime, failing to realize this promise. Here we show that language models leveraging flow-based continuous denoising can outperform discrete diffusion in both quality and speed. By revisiting the fundamentals of flows over discrete modalities, we build a flow-based language model (FLM) that performs Euclidean denoising over one-hot token encodings. We show that the model can be trained by predicting the clean data via a cross entropy objective, where we introduce a simple time reparameterization that greatly improves training stability and generation quality. By distilling FLM into its associated flow map, we obtain a distilled flow map language model (FMLM) capable of few-step generation. On the LM1B and OWT language datasets, FLM attains generation quality matching state-of-the-art discrete diffusion models. With FMLM, our approach outperforms recent few-step language models across the board, with one-step generation exceeding their 8-step quality. Our work calls into question the widely held hypothesis that discrete diffusion processes are necessary for generative modeling over discrete modalities, and paves the way toward accelerated flow-based language modeling at scale.
Yours truly,
Subham, Justin, Zhihan
Website, Twitter, Discord, YouTube