Stanford MLSys Seminar Episode 46: Albert Gu [Th, 1.35-2.30pm PT]

33 views

Skip to first unread message

Karan Goel

unread,

Nov 10, 2021, 4:01:03 PM11/10/21

to stanford-ml...@googlegroups.com

Hi everyone,

We're back with the forty-sixth episode of the MLSys Seminar on Thursday from 1.35-2.30pm PT.

We'll be joined by Albert Gu, who will talk about efficiently modeling long sequence data. The format is a 30 minute talk followed by a 30 minute podcast-style discussion, where the live audience can ask questions.

Livestream Link: https://www.youtube.com/watch?v=EvQ3ncuriCM

Guests: Albert Gu

Title: Efficiently Modeling Long Sequences with Structured State Spaces

Abstract: A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of 10000 or more steps. We introduce a simple sequence model based on the fundamental state space representation $x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)$ and show that it combines the strengths of several model families. Furthermore, we show that the HiPPO theory of continuous-time memorization can be incorporated into the state matrix $A$, producing a class of structured models that handles long-range dependencies mathematically and can be computed very efficiently. The Structured State Space (S3) model achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation 60X faster, (iii) SotA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.
Bio: Albert Gu is a PhD student in the Stanford CS department, advised by Chris Ré. His research interests include algorithms for structured linear algebra and theoretical principles of deep sequence models.

See you all there!

Best,

Karan

Karan Goel

unread,

Nov 11, 2021, 4:15:19 PM11/11/21

to stanford-ml...@googlegroups.com

Reminder: this is in 20 minutes!

Reply all

Reply to author

Forward

0 new messages