Diffusion LLM

unread,

Jun 27, 2026, 2:01:55 AMJun 27

to diffus...@googlegroups.com

Hello folks,

Nemotron-Labs-Diffusion is a tri-mode language model (LM) that unifies AR, diffusion, and self-speculation decoding within a single architecture.

Trained with a joint AR-diffusion objective, Nemotron-Labs-Diffusion can switch modes to sustain high throughput across deployment settings and concurrency levels. Their study shows that

AR and diffusion objectives are complementary: diffusion improves lookahead planning, while AR provides left-to-right linguistic priors.
In self-speculation mode, diffusion drafts while AR verifies, outperforming multi-token prediction (MTP) methods in both acceptance rate and real-device efficiency.
A speed-of-light analysis further demonstrates diffusion’s long-term potential, with up to 76.5% more tokens per forward pass than self-speculation under an optimal sampler.

Scaling to 3B, 8B, and 14B parameters, the Nemotron-Labs-Diffusion family, including base, instruct, and vision-language models, consistently outperforms state-of-the-art open-source AR and diffusion LMs in both accuracy and speed.

For example, Nemotron-Labs-Diffusion-8B decodes 6× more tokens per forward than Qwen3-8B with comparable accuracy, translating to 4× higher throughput on SPEED-Bench with SGLang on a GB200 GPU.

This Monday, Yonggan Fu from NVIDIA Research will present Nemotron-Labs-Diffusion: Unifying AR, Diffusion, and Self-Speculation.

Title: Nemotron-Labs-Diffusion: Unifying AR, Diffusion, and Self-Speculation

Meeting Link: click here

Time: June 29 (Monday) 1pm ET / 10am PT / 7pm CET / 10:30pm IST

Paper: Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding | Research

Prior knowledge:

Fundamentals of discrete diffusion (video by Sasha Rush)

Abstract:

We introduce Nemotron-Labs-Diffusion, a tri-mode language model (LM) that unifies AR, diffusion, and self-speculation decoding within a single architecture. Trained with a joint AR-diffusion objective, Nemotron-Labs-Diffusion can switch modes to sustain high throughput across deployment settings and concurrency levels. Our study shows that (1) AR and diffusion objectives are complementary: diffusion improves lookahead planning, while AR provides left-to-right linguistic priors. (2) In self-speculation mode, diffusion drafts while AR verifies, outperforming multi-token prediction (MTP) methods in both acceptance rate and real-device efficiency. (3) A speed-of-light analysis further demonstrates diffusion’s long-term potential, with up to 76.5% more tokens per forward pass than self-speculation under an optimal sampler. Scaling to 3B, 8B, and 14B parameters, our Nemotron-Labs-Diffusion family, including base, instruct, and vision-language models, consistently outperforms state-of-the-art open-source AR and diffusion LMs in both accuracy and speed. For example, Nemotron-Labs-Diffusion-8B decodes 5.9×more tokens per forward than Qwen3-8B with better accuracy, translating to 4× higher throughput on SPEED-Bench with SGLang on a GB200 GPU.

Session 22: Nemotron-Labs-Diffusion: Unifying AR, Diffusion, and Self-Speculation

Jun 29, 2026, 10:00am – Jun 29, 2026, 11:00am

(GMT-07:00) Pacific Time - Los Angeles

Yours truly,

Subham, Justin, Zhihan

Website, Twitter, Discord, YouTube

Diffusion LLM

unread,

Jun 29, 2026, 12:00:33 PMJun 29

to diffus...@googlegroups.com

This is happening in 1 hour!!

Gentle reminder: See you all at 1pm ET / 10am PT / 7pm CET / 10:30pm IST

Meeting Link: click here

Today's paper: Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding | Research

Diffusion LLM

unread,

Jun 30, 2026, 5:50:23 PMJun 30

to Diffusion-llms

Hi folks, we just uploaded the recording of monday's session, make sure to check it out: https://youtu.be/ecwcMM6vzV0

Reply all

Reply to author

Forward