Today: “Accelerating LLM Inference with vLLM (and SGLang)” - Ion Stoica (Berkeley & Anyscale & Databricks)

21 views
Skip to first unread message

Faster LLM Inference Seminar

unread,
Mar 4, 2025, 9:59:51 AMMar 4
to Faster LLM Inference Seminar

Date & Time:

Today at 3:30 PM EST

(Add to calendar)


Abstract:

Inference efficiency remains a critical challenge for deploying large language models (LLMs) at scale. In this talk, I will present our work on LLM inference we have conducted at Berkeley over the past two years in the context of vLLM and SGLang, which are today the most popular open-source inference engines. In particular, I will describe some of the key techniques they introduced, PagedAttention and RadixAttention, which are now widely used by the majority of LLM inference engines. Finally, I will discuss the new architecture of vLLM.


Registration:

https://faster-llms.vercel.app


We look forward to your participation!

Reply all
Reply to author
Forward
0 new messages