Session 6 // Jan 19: TiDAR: Think in Diffusion, Talk in Autoregression

Diffusion LLM

unread,

Jan 15, 2026, 3:29:25 PMJan 15

to diffus...@googlegroups.com

Hello folks,

Diffusion language models enable fast parallel generation, while autoregressive (AR) models typically deliver higher quality thanks to their causal structure. A central challenge is whether these advantages can be unified to achieve both high throughput and AR-level quality with efficient GPU utilization.

This Monday, Jingyu Liu (UChicago) will discuss TiDAR, a hybrid decoding approach that combines diffusion-style parallel drafting with autoregressive verification for high quality and high throughput.

The project was co-led by Jingyu Liu (UChicago) and Xin Dong (NVIDIA).

Title: TiDAR: Think in Diffusion, Talk in Autoregression

Meeting Link: click here

Time: Jan 19 (Monday) 1pm ET / 10am PT / 7pm CET / 11:30pm IST

Paper: https://arxiv.org/abs/2511.08923

Prior knowledge:

Fundamentals of discrete diffusion (video by Sasha Rush)

Abstract: Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation and forfeits its potential parallelizability. We introduce TiDAR, a sequence-level hybrid architecture that drafts tokens (Thinking) in Diffusion and samples final outputs (Talking) AutoRegressively - all within a single forward pass using specially designed structured attention masks. This design exploits the free GPU compute density, achieving a strong balance between drafting and verification capacity. Moreover, TiDAR is designed to be serving-friendly (low overhead) as a standalone model. We extensively evaluate TiDAR against AR models, speculative decoding, and diffusion variants across generative and likelihood tasks at 1.5B and 8B scales. Thanks to the parallel drafting and sampling as well as exact KV cache support, TiDAR outperforms speculative decoding in measured throughput and surpasses diffusion models like Dream and Llada in both efficiency and quality. Most notably, TiDAR is the first architecture to close the quality gap with AR models while delivering 4.71x to 5.91x more tokens per second.

Yours truly,

Subham, Justin, Zhihan

Website, Twitter, Discord, YouTube

Diffusion LLM

unread,

Jan 19, 2026, 4:35:40 AMJan 19

to Diffusion-llms

Gentle reminder: See you all at 1pm ET / 10am PT / 7pm CET / 11:30pm IST

Meeting Link: click here (Note this is a special link for this week only, we'll resume using the regular meeting link next week)

Today's paper: https://arxiv.org/abs/2511.08923

Diffusion LLM

unread,

Jan 19, 2026, 11:30:09 AMJan 19

to diffus...@googlegroups.com

Gentle reminder: See you all at 1pm ET / 10am PT / 7pm CET / 11:30pm IST

Meeting Link: click here

Today's paper: https://arxiv.org/abs/2511.08923

Diffusion LLM

unread,

Jan 19, 2026, 12:33:58 PMJan 19

to Diffusion-llms

Hi all,

A small clarification: the link for tonight's session is the following: click here. The email sent an hour ago was sent by mistake. Make sure to join the session with the link in this message. Apologies for the confusion.

See you shortly!

Diffusion LLM

unread,

Jan 19, 2026, 4:41:21 PMJan 19

to Diffusion-llms

Hello folks, the recording of Jingyu's talk is now available on YouTube, make sure to check it out: https://youtu.be/Is7h-sDGnno

Reply all

Reply to author

Forward