March 7th Community Meeting : Chien-Yu Lin on LLM quantization techniques!

Zachary Tatlock

unread,

Mar 1, 2024, 11:40:38 PM3/1/24

to fpb...@fpbench.org, Chien-Yu Lin, Yilong Zhao, Zihao Ye

Howdy folks!

This upcoming Thursday, we're very excited to welcome Chien-Yu Lin from UW to talk about low-bit quantization techniques for LLMs.

The talk will based on their recent paper: Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

The growing demand for Large Language Models (LLMs) in applications such as content generation, intelligent chatbots, and sentiment analysis poses considerable challenges for LLM service providers. To efficiently use GPU resources and boost throughput, batching multiple requests has emerged as a popular paradigm; to further speed up batching, LLM quantization techniques reduce memory consumption and increase computing capacity. However, prevalent quantization schemes (e.g., 8-bit weight-activation quantization) cannot fully leverage the capabilities of modern GPUs, such as 4-bit integer operators, resulting in sub-optimal performance.

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss. Atom significantly boosts serving throughput by using low-bit operators and considerably reduces memory consumption via low-bit quantization. It attains high accuracy by applying a novel mixed-precision and fine-grained quantization process. We evaluate Atom on 4-bit weight-activation quantization setups in the serving context. Atom improves end-to-end throughput by up to 7.73× compared to the FP16 and by 2.53× compared to INT8 quantization, while maintaining the same latency target.

As usual, we'll meet at 9am PT on Thursday (March 7th) in this Zoom:

https://washington.zoom.us/j/9283133132

Looking forward to seeing y'all and having another great discussion!!

Cheers,

Z

--

Zachary Tatlock

https://ztatlock.net

Associate Professor

Paul G. Allen School of Computer Science & Engineering

University of Washington

Zachary Tatlock

unread,

Mar 5, 2024, 11:08:15 AM3/5/24

to fpb...@fpbench.org, Chien-Yu Lin, Yilong Zhao, Zihao Ye

Friendly reminder that we'll have a community meeting this Thursday to discuss more about LLMs and quantization, led by Chien-Yu!

See y'all at 9am PT on Thursday, March 7th in this Zoom!

https://washington.zoom.us/j/9283133132

Best,

Z

Zachary Tatlock

unread,

Mar 7, 2024, 12:11:32 PM3/7/24

to fpb...@fpbench.org, Chien-Yu Lin, Yilong Zhao, Zihao Ye

Very sorry -- the earlier Zoom link should have been : https://washington.zoom.us/j/92831331326 !

Z

Reply all

Reply to author

Forward