Hi all,
On Friday, May 8, 2026, the NSF Institute for Artificial Intelligence and Fundamental Interactions (IAIFI) will host our next public colloquium. The details are below. We hope you can join us!
To hear more more about IAIFI, you can sign up for our mailing list here: https://mailman.mit.edu/mailman/listinfo/iaifi-news
Best,
Thomas
--
Details:
2:00pm ET on Friday, May 8, 2026
IAIFI Public Colloquium (https://iaifi.org/events.html)
Quantization for LLMs and matrix multiplication
Yury Polyanskiy, Professor, MIT
Watch on YouTube: https://www.youtube.com/channel/UCueoFcGm_15kSB-wDd4CBZA
Abstract: Modern LLMs store information in high-dimensional vectors rather inefficiently. For example, a sequence of tokens (18-bit integers) is mapped to 2-5k dimensional vectors stored as 16-bit floats — roughly a 18 → 100,000 bit expansion per token. Given this redundancy, it is unsurprising that LLMs tolerate substantial reductions in the precision of their basic operations (matrix multiplication) without catastrophic loss. A voluminous literature has emerged on this topic over the past five years. Despite significant algorithmic progress, however, our understanding of the fundamental limits — lower bounds — remains nascent.
In this talk I will present our initial results on the information-theoretic tradeoffs arising in quantized matrix multiplication. I will also show how information-theoretic insights enable the design of more efficient quantization algorithms for real-world LLMs, achieving state-of-the-art performance at 2–4 bits per parameter (NestQuant, WaterSIC).
--
Thomas Bradford
Project Coordinator, IAIFI